Skip to main content

Hybrid clustering based on content and connection structure using joint nonnegative matrix factorization


A hybrid method called JointNMF is presented which is applied to latent information discovery from data sets that contain both text content and connection structure information. The new method jointly optimizes an integrated objective function, which is a combination of two components: the Nonnegative Matrix Factorization (NMF) objective function for handling text content and the Symmetric NMF (SymNMF) objective function for handling network structure information. An effective algorithm for the joint NMF objective function is proposed so that the efficient method of block coordinate descent framework can be utilized. The proposed hybrid method simultaneously discovers content associations and related latent connections without any need for postprocessing of additional clustering. It is shown that the proposed method can also be applied when the text content is associated with hypergraph edges. An additional capability of the JointNMF is prediction of unknown network information which is illustrated using several real world problems such as citation recommendations of papers and leader detection in organizations. The proposed method can also be applied to general data expressed with both feature space vectors and pairwise similarities and can be extended to the case with multiple feature spaces or multiple similarity measures. Our experimental results illustrate multiple advantages of the proposed hybrid method when both content and connection structure information is available in the data for obtaining higher quality clustering results and discovery of new information such as unknown link prediction.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4



  2. These data sets are available at




  1. Bertsekas, D.: Nonlinear Programming. Athena Scientific, Belmont (1999)

    MATH  Google Scholar 

  2. Chang, J., Blei, D.M.: Hierarchical relational models for document networks. Ann. Appl. Stat. 4(1), 124–150 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  3. Choo, J., Lee, C., Reddy, C.K., Park, H.: Utopian: user-driven topic modeling based on interactive nonnegative matrix factorization. IEEE Trans. Vis. Comput. Graph. 19(12), 1992–2001 (2013). doi:10.1109/TVCG.2013.212

    Article  Google Scholar 

  4. Cohn, D.A., Hofmann, T.: The missing link–a probabilistic model of document content and hypertext connectivity. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems, vol. 13, pp. 430–436. MIT Press, Cambridge (2001)

    Google Scholar 

  5. Cruz, J., Bothorel, C., Poulet, F.: Entropy based community detection in augmented social networks. In: 2011 International Conference on Computational Aspects of Social Networks (CASoN), pp. 163–168 (2011). doi:10.1109/CASON.2011.6085937

  6. Drake, B., Kim, J., Mallick, M., Park, H.: Supervised Raman spectra estimation based on nonnegative rank deficient least squares. In: Proceedings 13th International Conference on Information Fusion, Edinburgh, UK (2010)

  7. Drake, B., Lee-Urban, S., Park, H.: Smallk is a C++/Python high-performance software library for nonnegative matrix factorization (nmf) and hierarchical and flat clustering using the nmf; current version 1.6.2. (2017)

  8. Elhadi, H., Agam, G.: Structure and attributes community detection: comparative analysis of composite, ensemble and selection methods. In: Proceedings of the 7th Workshop on Social Network Mining and Analysis, SNAKDD ’13, pp. 10:1–10:7. ACM, New York, NY, USA (2013). doi:10.1145/2501025.2501034

  9. Erosheva, E., Fienberg, S., Lafferty, J.: Mixed-membership models of scientific publications. Proc. Natl. Acad. Sci. 101(suppl 1), 5220–5227 (2004). doi:10.1073/pnas.0307760101

    Article  Google Scholar 

  10. Gruber, A., Rosen-Zvi, M., Weiss, Y.: Latent topic models for hypertext. In: Proceedings of the Twenty-Fourth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-08), pp. 230–239. AUAI Press, Corvallis, Oregon (2008)

  11. Jin, D., Gabrys, B., Dang, J.: Combined node and link partitions method for finding overlapping communities in complex networks. Scientific Reports 5 (2015). doi:10.1038/srep08600

  12. Kannan, R., Ishteva, M., Drake, B., Park, H.: Bounded matrix low rank approximation. In: Naik, G.R. (ed.) Non-negative Matrix Factorisation Techniques: Advances in Theory and Applications, pp. 89–118. Berlin Heidelberg, Springer (2016)

    Chapter  Google Scholar 

  13. Kannan, R., Ishteva, M., Park, H.: Bounded matrix factorization for recommender system. Knowl. Inf. Syst. 39(3), 491–511 (2014)

    Article  Google Scholar 

  14. Kim, J., He, Y., Park, H.: Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework. J. Glob. Optim. 58(2), 285–319 (2014). doi:10.1007/s10898-013-0035-4

    Article  MathSciNet  MATH  Google Scholar 

  15. Kim, J., Park, H.: Fast nonnegative matrix factorization: an active-set-like method and comparisons. SIAM J. Sci. Comput. 33(6), 3261–3281 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  16. Kuang, D., Choo, J., Park, H.: Nonnegative matrix factorization for interactive topic modeling and document clustering. In: Celebi, M.E. (ed.) Partitional Clustering Algorithms, pp. 215–243. Springer International Publishing, Berlin (2015). doi:10.1007/978-3-319-09259-1_7

    Google Scholar 

  17. Kuang, D., Park, H.: Fast rank-2 nonnegative matrix factorization for hierarchical document clustering. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 739–747. ACM (2013)

  18. Kuang, D., Park, H., Ding, C.H.: Symmetric nonnegative matrix factorization for graph clustering. In: SDM, vol. 12, pp. 106–117. SIAM (2012)

  19. Kuang, D., Yun, S., Park, H.: SymNMF: Nonnegative low-rank approximation of a similarity matrix for graph clustering. J. Glob. Optim. 62(3), 545–574 (2015). doi:10.1007/s10898-014-0247-2

    Article  MathSciNet  MATH  Google Scholar 

  20. Leskovec, J., Krevl, A.: SNAP datasets: stanford large network dataset collection. (2014)

  21. Liu, J., Wang, C., Gao, J., Han, J.: Multi-view clustering via joint nonnegative matrix factorization. In: Proceedings of the 2013 SIAM International Conference on Data Mining, Proceedings, pp. 252–260. Society for Industrial and Applied Mathematics (2013)

  22. Liu, Y., Niculescu-Mizil, A., Gryc, W.: Topic-link LDA: joint models of topic and author community. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pp. 665–672. ACM, New York, NY, USA (2009). doi:10.1145/1553374.1553460

  23. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Book  MATH  Google Scholar 

  24. Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: Proceedings of the 17th International Conference on World Wide Web, WWW ‘08, pp. 101–110. ACM, New York, NY, USA (2008). doi:10.1145/1367497.1367512

  25. Nallapati, R.M., Ahmed, A., Xing, E.P., Cohen, W.W.: Joint latent topic models for text and citations. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ‘08, pp. 542–550. ACM, New York, NY, USA (2008). doi:10.1145/1401890.1401957

  26. Ruan, Y., Fuhry, D., Parthasarathy, S.: Efficient community detection in large networks using content and links. In: Proceedings of the 22nd International Conference on World Wide Web, WWW ‘13, pp. 1089–1098. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2013)

  27. Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003). doi:10.1162/153244303321897735

    MathSciNet  MATH  Google Scholar 

  28. Sun, Y., Aggarwal, C.C., Han, J.: Relation strength-aware clustering of heterogeneous information networks with incomplete attributes. Proc. VLDB Endow. 5(5), 394–405 (2012). doi:10.14778/2140436.2140437

    Article  Google Scholar 

  29. Tang, J., Wang, X., Liu, H.: Integrating social media data for community detection. In: Proceedings of the 2011 International Conference on Modeling and Mining Ubiquitous Social Media, MSM‘11, pp. 1–20. Springer, Berlin, Heidelberg (2012). doi:10.1007/978-3-642-33684-3

  30. Wang, X., Tang, L., Gao, H., Liu, H.: Discovering overlapping groups in social media. In: 2010 IEEE International Conference on Data Mining, pp. 569–578 (2010). doi:10.1109/ICDM.2010.48

  31. Wang, X., Tang, L., Liu, H., Wang, L.: Learning with multi-resolution overlapping communities. Knowl. Inf. Syst. 36(2), 517–535 (2013). doi:10.1007/s10115-012-0555-0

    Article  Google Scholar 

  32. Xu, Y., Yin, W., Wen, Z., Zhang, Y.: An alternating direction algorithm for matrix completion with nonnegative factors. Front. Math. China 7(2), 365–384 (2012). doi:10.1007/s11464-012-0194-5

    Article  MathSciNet  MATH  Google Scholar 

  33. Yang, J., Leskovec, J.: Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, pp. 587–596. ACM (2013)

  34. Yang, T., Jin, R., Chi, Y., Zhu, S.: Combining link and content for community detection: a discriminative approach. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ‘09, pp. 927–936. ACM, New York, NY, USA (2009). doi:10.1145/1557019.1557120

  35. Zhou, D., Huang, J., Schölkopf, B.: Learning with hypergraphs: clustering, classification, and embedding. In: Schölkopf, B., Platt, J.C., Hoffman, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, pp. 1601–1608. MIT Press, Cambridge (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Haesun Park.

Additional information

This work was supported in part by the National Science Foundation (NSF) Grant IIS-1348152 and Defense Advanced Research Projects Agency (DARPA) XDATA program Grant FA8750-12-2-0309. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF or DARPA.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Du, R., Drake, B. & Park, H. Hybrid clustering based on content and connection structure using joint nonnegative matrix factorization. J Glob Optim 74, 861–877 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: