Knowledge and Information Systems

, Volume 44, Issue 2, pp 359–383 | Cite as

The Author-Topic-Community model for author interest profiling and community discovery

  • Chunshan Li
  • William K. Cheung
  • Yunming Ye
  • Xiaofeng Zhang
  • Dianhui Chu
  • Xin Li
Regular Paper


In this paper, we propose a generative model named the author-topic-community (ATC) model for representing a corpus of linked documents. The ATC model allows each author to be associated with a topic distribution and a community distribution as its model parameters. A learning algorithm based on variational inference is derived for the model parameter estimation where the two distributions are essentially reinforcing each other during the estimation. We compare the performance of the ATC model with two related generative models using first synthetic data sets and then real data sets, which include a research community data set, a blog data set, a news-sharing data set, and a microblogging data set. The empirical results obtained confirm that the proposed ATC model outperforms the existing models for tasks such as author interest profiling and author community discovery. We also demonstrate how the inferred ATC model can be used to characterize the roles of users/authors in online communities.


Graphical models Author community discovery Author interest profiling Variational inference 



C. Li’s research is supported in part by NSFC under Grant No. 61370213, National Key Technology R&D Program No. 2012BAH10F03, 2013BAH17F00, Shenzhen Strategic Emerging Industries Program under Grant No. JCYJ20120613150552967 and Science and Technology Development of Shandong Province Nos. 2010GZX20126, 2010GGX10116.


  1. 1.
    Rosen-Zvi M, Griffiths T, Steyvers M, Smyth P (2004) The author-topic model for authors and documents. In: Proceedings of the 20th conference on uncertainty in artificial intelligence, AUAI Press, Arlington, pp 487–494Google Scholar
  2. 2.
    Zhou D, Manavogl E, Li J, Giles C, Zha H (2006) Probabilistic models for discovering e-communities. In: Proceedings of the 15th international world wide web conference, pp 173–182Google Scholar
  3. 3.
    Francois F, Alain P (2007) Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Trans Knowl Data Eng 19(3):355–369CrossRefGoogle Scholar
  4. 4.
    Clementi AE, Monti A, Pasquale F, Silvestri R (2009) Information spreading in stationary markovian evolving graphs. In: Proceedings of international symposium on parallel and distributed processing, IPDPS 2009, pp 1–12Google Scholar
  5. 5.
    Miritello G, Moro E, Lara R (2011) Dynamical strength of social ties in information spreading. Phys Rev E 83(4)Google Scholar
  6. 6.
    Liu Y, Niculescu-Mizil A, Gryc W (2009) Topic-link LDA: joint models of topic and author community. In: Proceedings of the 26th annual international conference on machine learning, pp 665–672Google Scholar
  7. 7.
    Tu Y, Johri N, Roth D, Hockenmaier J (2010) Citation author topic model in expert search. In: Proceedings of the 23rd international conference on computational linguistics: posters, association for computational linguistics, pp 1265–1273Google Scholar
  8. 8.
    Kataria S, Mitra P, Caragea C, Giles C (2011) Context sensitive topic models for author influence in document networks. In: Proceedings of the 22nd international joint conference on artificial intelligence, pp 2274–2280Google Scholar
  9. 9.
    Li C, Cheung WK, Ye Y, Zhang X (2012) The Author-Topic-Community model: a generative model relating authors’ interests and their community structure. In: Advanced Data Mining and Applications, 8th International Conference, ADMA 2012, Nanjing, China, 15–18 December 2012Google Scholar
  10. 10.
    Quan X, Liu G, Lu Z, Ni X (2010) Short text similarity based on probabilistic topics. Knowl Inf Syst 25(3):473–491CrossRefGoogle Scholar
  11. 11.
    Yu X, Lam W (2012) Probabilistic joint models incorporating logic and learning via structured variational approximation for information extraction. Knowl Inf Syst 32(2):415–444CrossRefGoogle Scholar
  12. 12.
    Blei D, Ng A, Jordan (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  13. 13.
    Cambria E, Rajagopal D, Olsher D, Das D (2013) Big social data analysis. In: Big Data Computing, pp 401–414Google Scholar
  14. 14.
    Rajagopal D, Olsher D, Cambria E, Kwok K (2013) Commonsense-based topic modeling. In: Proceedings of the second international workshop on issues of sentiment discovery and opinion mining, pp 6–14Google Scholar
  15. 15.
    Lau R, Xia Y, Ye Y (2014) A probabilistic generative model for mining cybercriminal networks from online social media. In: IEEE computational intelligence magazine, pp 31–43Google Scholar
  16. 16.
    Nallapati R, Ahmed A, Xing E, Cohen W (2008) Joint latent topic models for text and citations. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 542–550Google Scholar
  17. 17.
    Mei Q, Cai D, Zhang D, Zhai C (2008) Topic modeling with network regularization. In: Proceedings of the 17th international world wide web conference, pp 101–110Google Scholar
  18. 18.
    Bhattacharya I, Getoor L (2005) A latent dirichlet model for unsupervised entity resolution. Technical reports of the Computer Science DepartmentGoogle Scholar
  19. 19.
    Shiozaki H, Eguchi K, Ohkawa T (2008) Entity network prediction using multitype topic models. In: Proceedings of the 12th Pacific-Asia conference on advances in knowledge discovery and data mining, Springer, Berlin, pp 705–714Google Scholar
  20. 20.
    Widyantoro H, Ioerger Thomas R, Yen John (1999) An adaptive algorithm for learning changes in user interests. In: Proceedings of the eighth international conference on information and knowledge management, pp 405–412Google Scholar
  21. 21.
    Golemati M, Katifori A, Vassilakis C, Lepouras G, Halatsis C (1999) Creating an ontology for the user profile: method and applications. In: Proceedings of the first RCIS conference, pp 407–412Google Scholar
  22. 22.
    Specia L, Motta E (2007) Integrating folksonomies with the semantic web. In: The semantic web: research and applicationsGoogle Scholar
  23. 23.
    Tang J, Yao L, Zhang D, Zhang J (2010) A combination approach to Web user profiling. In: ACM transactions on knowledge discovery from data, pp 1–44Google Scholar
  24. 24.
    Leskovec J, Lang J, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th international conference on world wide web, pp 631–640Google Scholar
  25. 25.
    Wasserman S, Faust K (1994) Social network analysis: methods and applications, vol 8. Cambridge University Press, New YorkCrossRefGoogle Scholar
  26. 26.
    Newman M, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2)Google Scholar
  27. 27.
    Newman ME (2006) Modularity and community structure in networks. Proc Nat Acad Sci 103(23):8577–8582CrossRefGoogle Scholar
  28. 28.
    Barber MJ (2007) Modularity and community detection in bipartite networks. Phys Rev E 76(6):066102CrossRefMathSciNetGoogle Scholar
  29. 29.
    Donath E, Hoffman J (1973) Lower bounds for the partitioning of graphs. IBM J Res Dev 17(5):420–425CrossRefzbMATHMathSciNetGoogle Scholar
  30. 30.
    Clauset A, Newman M, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6)Google Scholar
  31. 31.
    Smyth P, White S (2005) A spectral clustering approach to finding communities in graphs. In: Proceedings of the fifth SIAM international conference on data mining, p 274Google Scholar
  32. 32.
    Duan D, Li Y, Li R, Lu Z, Wen A (2013) Mei: Mutual enhanced infinite community-topic model for analyzing text-augmented social networks. Comput J 56(3):336–354CrossRefGoogle Scholar
  33. 33.
    Zhao Z, Feng S, Wang Q, Huang Z, Williams J, Fan J (2012) Topic oriented community detection through social objects and link analysis in social networks. Knowl Based Syst 26:164–173CrossRefGoogle Scholar
  34. 34.
    Li D, Ding Y, Shua X, Bollen J, Tang J, Chen S, Zhu J, Rocha G (2012) Adding community and dynamic to topic models. J Informetr 6(2):237–253CrossRefGoogle Scholar
  35. 35.
    Minka T (2001) Expectation propagation for approximate Bayesian inference. In: Proceedings of the seventeenth conference on uncertainty in artificial intelligence, Morgan Kaufmann, San Francisco, pp 362–369Google Scholar
  36. 36.
    Griffiths T, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci USA 101(Suppl 1):5228CrossRefGoogle Scholar
  37. 37.
    Blei D, Jordan MI (2006) Variational inference for dirichlet process mixtures. Bayesian Anal 1(1):121–143CrossRefMathSciNetGoogle Scholar
  38. 38.
    Buntine W, Jakulin A (2004) Applying discrete PCA in data analysis. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, AUAI Press pp. 59–66Google Scholar
  39. 39.
    Lin Y, Chi Y, Zhu S, Sundaram H, Tseng B (2009) Analyzing communities and their evolutions in dynamic social networks. ACM Trans Knowl Discov Data 3(2)Google Scholar
  40. 40.
    Chang J, Blei D (2009) Relational topic models for document networks. In: Proceedings of artificial intelligence and statistics pp 81–88Google Scholar
  41. 41.
    Du L, Buntine W, Jin H, Chen C (2012) Sequential latent dirichlet allocation. Knowl Inf Syst 31(3):475–503CrossRefGoogle Scholar
  42. 42.
    Yan X, Guo J, Lan Y, Cheng X (2013) A biterm topic model for short texts. In: Proceedings of the 22nd international conference on world wide web, pp 1445–1456Google Scholar

Copyright information

© Springer-Verlag London 2014

Authors and Affiliations

  • Chunshan Li
    • 1
  • William K. Cheung
    • 2
  • Yunming Ye
    • 3
    • 4
  • Xiaofeng Zhang
    • 3
    • 4
  • Dianhui Chu
    • 1
  • Xin Li
    • 5
  1. 1.School of Computer Science and TechnologyHarbin Institute of TechnologyWeihaiChina
  2. 2.Department of Computer ScienceHong Kong Baptist UniversityKowloonHong Kong SAR
  3. 3.Harbin Institute of TechnologyShenzhen Graduate SchoolShenzhenChina
  4. 4.Shenzhen Key Laboratory of Internet Information CollaborationShenzhenChina
  5. 5.School of Computer Science and TechnologyBeijing Institute of TechnologyBeijingChina

Personalised recommendations