Skip to main content

Efficient nonparametric and asymptotic Bayesian model selection methods for attributed graph clustering


Attributed graph clustering, also known as community detection on attributed graphs, attracts much interests recently due to the ubiquity of attributed graphs in real life. Many existing algorithms have been proposed for this problem, which are either distance based or model based. However, model selection in attributed graph clustering has not been well addressed, that is, most existing algorithms assume the cluster number to be known a priori. In this paper, we propose two efficient approaches for attributed graph clustering with automatic model selection. The first approach is a popular Bayesian nonparametric method, while the second approach is an asymptotic method based on a recently proposed model selection criterion, factorized information criterion. Experimental results on both synthetic and real datasets demonstrate that our approaches for attributed graph clustering with automatic model selection significantly outperform the state-of-the-art algorithm.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9


  1. 1.

    i.e., we consider only node-attributed graphs throughout the paper.

  2. 2.

    Non-regular models refer to the models that do not satisfy regularity conditions with BIC [4].

  3. 3.

    The zero diagonal of \({\mathbf {X}}\) means no self-loops in the corresponding graph while symmetry means that the graph is undirected, in accordance with our focus on undirected simple graphs.

  4. 4.

    The definition of our clustering requires as less edges as possible between distinct clusters.

  5. 5.

    Multinomial and Dirichlet distributions are conjugate. As a special case, Bernoulli and Beta distributions are conjugate as well.

  6. 6.

    The stick-breaking prior is a representation of the Dirichlet process and often used for variational inference. The Dirichlet process here is the distribution of a random probability measure over positive integers.

  7. 7.

    That is, each prior is a uniform distribution over the components. This is reasonable given that we do not have any prior information on the proportion of different components and thus they are treated equally important.

  8. 8.

    The corresponding assortativity coefficient is negative, \(r=-0.079\).


  1. 1.

    Akoglu L, Tong H, Meeder B, Faloutsos C (2012) Pics: parameter-free identification of cohesive subgroups in large attributed graphs. SDM, pp 439–450

  2. 2.

    Banerjee B, Bovolo F, Bhattacharya A, Bruzzone L, Chaudhuri S, Mohan BK (2015) A new self-training-based unsupervised satellite image classification technique using cluster ensemble strategy. IEEE Geosci Remote Sens Lett 12(4):741–745

    Article  Google Scholar 

  3. 3.

    Beal MJ (2003) Variational algorithms for approximate Bayesian inference. PhD thesis, Gatsby Computational Neuroscience Unit, University College London

  4. 4.

    Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer, Secaucus

    MATH  Google Scholar 

  5. 5.

    Bothorel C, Cruz JD, Magnani M, Micenková B (2015) Clustering attributed graphs: models, measures and methods. CoRR arXiv:1501.01676

  6. 6.

    Daudin J-J, Picard F, Robin S (2008) A mixture model for random graphs. Stat Comput 18(2):173–183

    MathSciNet  Article  Google Scholar 

  7. 7.

    Ester M, Ge R, Gao BJ, Hu Z, Ben-Moshe B (2006) Joint cluster analysis of attribute data and relationship data: the connected k-center problem. In: Proceedings of the sixth SIAM international conference on data mining, Bethesda, MD, USA, 20–22 April 2006. pp 246–257. doi:10.1137/1.9781611972764.22

  8. 8.

    Fujimaki R, Hayashi K (2012) Factorized asymptotic Bayesian hidden Markov models. In: Proceedings of the 29th international conference on machine learning, ICML 2012, Edinburgh, Scotland, UK, 26 June–1 July, 2012

  9. 9.

    Fujimaki R, Morinaga S (2012) Factorized asymptotic Bayesian inference for mixture modeling. In: Proceedings of the fifteenth international conference on artificial intelligence and statistics, AISTATS 2012, La Palma, Canary Islands, 21–23 April 2012. pp 400–408

  10. 10.

    Ghahramani Z, Beal MJ (1999) Variational inference for Bayesian mixtures of factor analysers. In: Advances in neural information processing systems 12, NIPS conference, Denver, Colorado, USA, 29 November–4 December, 1999. pp 449–455

  11. 11.

    Henderson K, Eliassi-Rad T, Papadimitriou S, Faloutsos C (2010) Hcdf: a hybrid community discovery framework. In: Proceedings of the SIAM international conference on data mining, SDM 2010, Columbus, Ohio, USA, 29 April–1 May, 2010. pp 754–765. doi:10.1137/1.9781611972801.66

  12. 12.

    Henderson K, Gallagher B, Eliassi-Rad T, Tong H, Basu S, Akoglu L, Koutra D, Faloutsos C, Li L (2012) Rolx: structural role extraction & mining in large graphs. In: The 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’12, Beijing, China, August 12–16, 2012, pp 1231–1239

  13. 13.

    Hofmann T (1999) Probabilistic latent semantic indexing. In: SIGIR ’99: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, Berkeley, CA, USA, 15–19 August 1999. pp 50–57. doi:10.1145/312624.312649

  14. 14.

    Jordan MI, Ghahramani Z, Jaakkola T, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37(2):183–233

    Article  MATH  Google Scholar 

  15. 15.

    Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392

    MathSciNet  Article  MATH  Google Scholar 

  16. 16.

    Kurihara K, Welling M, Teh YW ( 2007) Collapsed variational Dirichlet process mixture models. In: IJCAI 2007, Proceedings of the 20th international joint conference on artificial intelligence, Hyderabad, India, January 6–12, 2007. pp 2796–2801

  17. 17.

    Lazarsfeld PF, Henry NW (1968) Latent structure analysis. Houghton Mifflin, Boston

    MATH  Google Scholar 

  18. 18.

    Lu Z, Sun X, Wen Y, Cao G, Porta TFL (2015) Algorithms and applications for ommunity detection in weighted networks. IEEE Trans Parallel Distrib Syst 26(11):2916–2926

    Article  Google Scholar 

  19. 19.

    Luo G (2016) A review of automatic selection methods for machine learning algorithms and hyper-parameter values. NetMAHIB 5(1):18. doi:10.1007/s13721-016-0125-6

  20. 20.

    Miller JW, Harrison MT (2013) A simple example of Dirichlet process mixture inconsistency for the number of components. In: Advances in neural information processing systems, vol 26, pp 199–206

  21. 21.

    Moser F, Ge R, Ester M (2007) Joint cluster analysis of attribute and relationship data without a-priori specification of the number of clusters. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, San Jose, California, USA, 12–15 August 2007. pp 510–519. doi:10.1145/1281192.1281248

  22. 22.

    Nallapati R, Ahmed A, Xing EP, Cohen WW (2008) Joint latent topic models for text and citations. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, Las Vegas, Nevada, USA, 24–27 August 2008. pp 542–550. doi:10.1145/1401890.1401957

  23. 23.

    Newman ME (2002) Assortative mixing in networks. Phys Rev Lett 89(20):208701

    Article  Google Scholar 

  24. 24.

    Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:066113

    MathSciNet  Article  Google Scholar 

  25. 25.

    Ng AY, Jordan MI, Weiss Y ( 2001) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems 14 [neural information processing systems: natural and synthetic, NIPS 2001, December 3–8, 2001, Vancouver, British Columbia, Canada], pp 849–856

  26. 26.

    Nowicki K, Snijders TA (2001) Estimation and prediction for stochastic blockstructures. J Am Stat Assoc 96(455):1077–1087

    MathSciNet  Article  MATH  Google Scholar 

  27. 27.

    Papadopoulos A, Rafailidis D, Pallis G, Dikaiakos MD (2015) Clustering attributed multi-graphs with information ranking. In: Database and expert systems applications—26th international conference, DEXA 2015, Valencia, Spain, September 1–4, 2015. Proceedings, Part I, pp 432–446

  28. 28.

    Semertzidis T, Rafailidis D, Strintzis MG, Daras P (2015) Large-scale spectral clustering based on pairwise constraints. Inf Process Manag 51(5):616–624

    Article  Google Scholar 

  29. 29.

    Steinhaeuser K, Chawla NV (2008) Community detection in a large real-world social network. In: Social computing, behavioral modeling, and prediction, pp 168–175

  30. 30.

    Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MathSciNet  MATH  Google Scholar 

  31. 31.

    Sun Y, Aggarwal CC, Han J (2012) Relation strength-aware clustering of heterogeneous information networks with incomplete attributes. PVLDB 5(5):394–405

    Google Scholar 

  32. 32.

    Teh YW (2010) Dirichlet process. In: Encyclopedia of machine learning, pp 280–287. doi:10.1007/978-0-387-30164-8_219

  33. 33.

    Vretos N, Solachidis V, Pitas I (2011) A mutual information based face clustering algorithm for movie content analysis. Image Vis Comput 29(10):693–705

    Article  Google Scholar 

  34. 34.

    Xu Z, Ke Y (2016) Effective and efficient spectral clustering on text and link data. In: Proceedings of the 25th ACM international on conference on information and knowledge management, CIKM 2016, Indianapolis, IN, USA, October 24–28, 2016, pp 357–366

  35. 35.

    Xu Z, Ke Y (2016) Stochastic variance reduced Riemannian eigensolver. CoRR arXiv:1605.08233

  36. 36.

    Xu Z, Ke Y, Wang Y (2014) A fast inference algorithm for stochastic blockmodel. In: 2014 IEEE international conference on data mining, ICDM 2014, Shenzhen, China, December 14–17, 2014, pp 620–629

  37. 37.

    Xu Z, Ke Y, Wang Y, Cheng H, Cheng J (2012) A model-based approach to attributed graph clustering. In: SIGMOD conference, pp 505–516

  38. 38.

    Xu Z, Ke Y, Wang Y, Cheng H, Cheng J (2014) GBAGC: a general bayesian framework for attributed graph clustering. TKDD 9(1):5:1–5:43

    Article  Google Scholar 

  39. 39.

    Xu Z, Zhao P, Cao J, Li X (2016) Matrix eigen-decomposition via doubly stochastic riemannian optimization. In: Proceedings of the 33rd international conference on machine learning, ICML 2016, New York City, NY, USA, June 19–24, 2016, pp 1660–1669

  40. 40.

    Yang J, McAuley JJ, Leskovec J (2013) Community detection in networks with node attributes. In: IEEE 13th international conference on data mining, Dallas, TX, USA, 7–10 December 2013. pp 1153–1156. doi:10.1109/ICDM.2013.167

  41. 41.

    Yang T, Jin R, Chi Y, Zhu S (2009) Combining link and content for community detection: a discriminative approach. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, Paris, France, 28 June–1 July, 2009. pp 927–936. doi:10.1145/1557019.1557120

  42. 42.

    Yu S, Yu K, Tresp V Kriegel H-P (2006) Variational Bayesian Dirichlet-multinomial allocation for exponential family mixtures. In: Machine learning: ECML 2006, 17th European conference on machine learning, Berlin, Germany, 18–22 September 2006. pp 841–848. doi:10.1007/11871842_87

  43. 43.

    Zanghi H, Volant S, Ambroise C (2010) Clustering based on random graph model embedding vertex features. Pattern Recognit Lett 31(9):830–836

    Article  Google Scholar 

  44. 44.

    Zhou T, Lü L, Zhang Y (2009) Predicting missing links via local information. Eur Phys J B Condens Matter Complex Syst 71(4):623–630

    Article  MATH  Google Scholar 

  45. 45.

    Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural/attribute similarities. PVLDB 2(1):718–729

    Google Scholar 

  46. 46.

    Zobay O (2009) Mean field inference for the dirichlet process mixture model. Electron J Stat 3:507–545

    MathSciNet  Article  MATH  Google Scholar 

Download references


The authors would like to thank the anonymous reviewers of the paper for their valuable comments that help significantly improve the quality of the paper.

Author information



Corresponding author

Correspondence to Zhiqiang Xu.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Xu, Z., Cheng, J., Xiao, X. et al. Efficient nonparametric and asymptotic Bayesian model selection methods for attributed graph clustering. Knowl Inf Syst 53, 239–268 (2017).

Download citation


  • Attributed graph clustering
  • Model selection
  • Dirichlet process
  • Factorized information criterion