Advertisement

Journal of Global Optimization

, Volume 74, Issue 4, pp 861–877 | Cite as

Hybrid clustering based on content and connection structure using joint nonnegative matrix factorization

  • Rundong Du
  • Barry Drake
  • Haesun ParkEmail author
Article

Abstract

A hybrid method called JointNMF is presented which is applied to latent information discovery from data sets that contain both text content and connection structure information. The new method jointly optimizes an integrated objective function, which is a combination of two components: the Nonnegative Matrix Factorization (NMF) objective function for handling text content and the Symmetric NMF (SymNMF) objective function for handling network structure information. An effective algorithm for the joint NMF objective function is proposed so that the efficient method of block coordinate descent framework can be utilized. The proposed hybrid method simultaneously discovers content associations and related latent connections without any need for postprocessing of additional clustering. It is shown that the proposed method can also be applied when the text content is associated with hypergraph edges. An additional capability of the JointNMF is prediction of unknown network information which is illustrated using several real world problems such as citation recommendations of papers and leader detection in organizations. The proposed method can also be applied to general data expressed with both feature space vectors and pairwise similarities and can be extended to the case with multiple feature spaces or multiple similarity measures. Our experimental results illustrate multiple advantages of the proposed hybrid method when both content and connection structure information is available in the data for obtaining higher quality clustering results and discovery of new information such as unknown link prediction.

Keywords

Joint nonnegative matrix factorization Symmetric NMF Constrained low rank approximation Content clustering Graph clustering Hybrid content and connection structure analysis 

References

  1. 1.
    Bertsekas, D.: Nonlinear Programming. Athena Scientific, Belmont (1999)zbMATHGoogle Scholar
  2. 2.
    Chang, J., Blei, D.M.: Hierarchical relational models for document networks. Ann. Appl. Stat. 4(1), 124–150 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Choo, J., Lee, C., Reddy, C.K., Park, H.: Utopian: user-driven topic modeling based on interactive nonnegative matrix factorization. IEEE Trans. Vis. Comput. Graph. 19(12), 1992–2001 (2013). doi: 10.1109/TVCG.2013.212 CrossRefGoogle Scholar
  4. 4.
    Cohn, D.A., Hofmann, T.: The missing link–a probabilistic model of document content and hypertext connectivity. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems, vol. 13, pp. 430–436. MIT Press, Cambridge (2001)Google Scholar
  5. 5.
    Cruz, J., Bothorel, C., Poulet, F.: Entropy based community detection in augmented social networks. In: 2011 International Conference on Computational Aspects of Social Networks (CASoN), pp. 163–168 (2011). doi: 10.1109/CASON.2011.6085937
  6. 6.
    Drake, B., Kim, J., Mallick, M., Park, H.: Supervised Raman spectra estimation based on nonnegative rank deficient least squares. In: Proceedings 13th International Conference on Information Fusion, Edinburgh, UK (2010)Google Scholar
  7. 7.
    Drake, B., Lee-Urban, S., Park, H.: Smallk is a C++/Python high-performance software library for nonnegative matrix factorization (nmf) and hierarchical and flat clustering using the nmf; current version 1.6.2. http://smallk.github.io/ (2017)
  8. 8.
    Elhadi, H., Agam, G.: Structure and attributes community detection: comparative analysis of composite, ensemble and selection methods. In: Proceedings of the 7th Workshop on Social Network Mining and Analysis, SNAKDD ’13, pp. 10:1–10:7. ACM, New York, NY, USA (2013). doi: 10.1145/2501025.2501034
  9. 9.
    Erosheva, E., Fienberg, S., Lafferty, J.: Mixed-membership models of scientific publications. Proc. Natl. Acad. Sci. 101(suppl 1), 5220–5227 (2004). doi: 10.1073/pnas.0307760101 CrossRefGoogle Scholar
  10. 10.
    Gruber, A., Rosen-Zvi, M., Weiss, Y.: Latent topic models for hypertext. In: Proceedings of the Twenty-Fourth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-08), pp. 230–239. AUAI Press, Corvallis, Oregon (2008)Google Scholar
  11. 11.
    Jin, D., Gabrys, B., Dang, J.: Combined node and link partitions method for finding overlapping communities in complex networks. Scientific Reports 5 (2015). doi: 10.1038/srep08600
  12. 12.
    Kannan, R., Ishteva, M., Drake, B., Park, H.: Bounded matrix low rank approximation. In: Naik, G.R. (ed.) Non-negative Matrix Factorisation Techniques: Advances in Theory and Applications, pp. 89–118. Berlin Heidelberg, Springer (2016)CrossRefGoogle Scholar
  13. 13.
    Kannan, R., Ishteva, M., Park, H.: Bounded matrix factorization for recommender system. Knowl. Inf. Syst. 39(3), 491–511 (2014)CrossRefGoogle Scholar
  14. 14.
    Kim, J., He, Y., Park, H.: Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework. J. Glob. Optim. 58(2), 285–319 (2014). doi: 10.1007/s10898-013-0035-4 MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Kim, J., Park, H.: Fast nonnegative matrix factorization: an active-set-like method and comparisons. SIAM J. Sci. Comput. 33(6), 3261–3281 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Kuang, D., Choo, J., Park, H.: Nonnegative matrix factorization for interactive topic modeling and document clustering. In: Celebi, M.E. (ed.) Partitional Clustering Algorithms, pp. 215–243. Springer International Publishing, Berlin (2015). doi: 10.1007/978-3-319-09259-1_7 Google Scholar
  17. 17.
    Kuang, D., Park, H.: Fast rank-2 nonnegative matrix factorization for hierarchical document clustering. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 739–747. ACM (2013)Google Scholar
  18. 18.
    Kuang, D., Park, H., Ding, C.H.: Symmetric nonnegative matrix factorization for graph clustering. In: SDM, vol. 12, pp. 106–117. SIAM (2012)Google Scholar
  19. 19.
    Kuang, D., Yun, S., Park, H.: SymNMF: Nonnegative low-rank approximation of a similarity matrix for graph clustering. J. Glob. Optim. 62(3), 545–574 (2015). doi: 10.1007/s10898-014-0247-2 MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Leskovec, J., Krevl, A.: SNAP datasets: stanford large network dataset collection. http://snap.stanford.edu/data (2014)
  21. 21.
    Liu, J., Wang, C., Gao, J., Han, J.: Multi-view clustering via joint nonnegative matrix factorization. In: Proceedings of the 2013 SIAM International Conference on Data Mining, Proceedings, pp. 252–260. Society for Industrial and Applied Mathematics (2013)Google Scholar
  22. 22.
    Liu, Y., Niculescu-Mizil, A., Gryc, W.: Topic-link LDA: joint models of topic and author community. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pp. 665–672. ACM, New York, NY, USA (2009). doi: 10.1145/1553374.1553460
  23. 23.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefzbMATHGoogle Scholar
  24. 24.
    Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: Proceedings of the 17th International Conference on World Wide Web, WWW ‘08, pp. 101–110. ACM, New York, NY, USA (2008). doi: 10.1145/1367497.1367512
  25. 25.
    Nallapati, R.M., Ahmed, A., Xing, E.P., Cohen, W.W.: Joint latent topic models for text and citations. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ‘08, pp. 542–550. ACM, New York, NY, USA (2008). doi: 10.1145/1401890.1401957
  26. 26.
    Ruan, Y., Fuhry, D., Parthasarathy, S.: Efficient community detection in large networks using content and links. In: Proceedings of the 22nd International Conference on World Wide Web, WWW ‘13, pp. 1089–1098. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2013)Google Scholar
  27. 27.
    Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003). doi: 10.1162/153244303321897735 MathSciNetzbMATHGoogle Scholar
  28. 28.
    Sun, Y., Aggarwal, C.C., Han, J.: Relation strength-aware clustering of heterogeneous information networks with incomplete attributes. Proc. VLDB Endow. 5(5), 394–405 (2012). doi: 10.14778/2140436.2140437 CrossRefGoogle Scholar
  29. 29.
    Tang, J., Wang, X., Liu, H.: Integrating social media data for community detection. In: Proceedings of the 2011 International Conference on Modeling and Mining Ubiquitous Social Media, MSM‘11, pp. 1–20. Springer, Berlin, Heidelberg (2012). doi: 10.1007/978-3-642-33684-3
  30. 30.
    Wang, X., Tang, L., Gao, H., Liu, H.: Discovering overlapping groups in social media. In: 2010 IEEE International Conference on Data Mining, pp. 569–578 (2010). doi: 10.1109/ICDM.2010.48
  31. 31.
    Wang, X., Tang, L., Liu, H., Wang, L.: Learning with multi-resolution overlapping communities. Knowl. Inf. Syst. 36(2), 517–535 (2013). doi: 10.1007/s10115-012-0555-0 CrossRefGoogle Scholar
  32. 32.
    Xu, Y., Yin, W., Wen, Z., Zhang, Y.: An alternating direction algorithm for matrix completion with nonnegative factors. Front. Math. China 7(2), 365–384 (2012). doi: 10.1007/s11464-012-0194-5 MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Yang, J., Leskovec, J.: Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, pp. 587–596. ACM (2013)Google Scholar
  34. 34.
    Yang, T., Jin, R., Chi, Y., Zhu, S.: Combining link and content for community detection: a discriminative approach. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ‘09, pp. 927–936. ACM, New York, NY, USA (2009). doi: 10.1145/1557019.1557120
  35. 35.
    Zhou, D., Huang, J., Schölkopf, B.: Learning with hypergraphs: clustering, classification, and embedding. In: Schölkopf, B., Platt, J.C., Hoffman, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, pp. 1601–1608. MIT Press, Cambridge (2007)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.School of MathematicsGeorgia Institute of TechnologyAtlantaUSA
  2. 2.Georgia Tech Research InstituteGeorgia Institute of TechnologyAtlantaUSA
  3. 3.School of Computational Science and EngineeringGeorgia Institute of TechnologyAtlantaUSA

Personalised recommendations