Abstract
A hybrid method called JointNMF is presented which is applied to latent information discovery from data sets that contain both text content and connection structure information. The new method jointly optimizes an integrated objective function, which is a combination of two components: the Nonnegative Matrix Factorization (NMF) objective function for handling text content and the Symmetric NMF (SymNMF) objective function for handling network structure information. An effective algorithm for the joint NMF objective function is proposed so that the efficient method of block coordinate descent framework can be utilized. The proposed hybrid method simultaneously discovers content associations and related latent connections without any need for postprocessing of additional clustering. It is shown that the proposed method can also be applied when the text content is associated with hypergraph edges. An additional capability of the JointNMF is prediction of unknown network information which is illustrated using several real world problems such as citation recommendations of papers and leader detection in organizations. The proposed method can also be applied to general data expressed with both feature space vectors and pairwise similarities and can be extended to the case with multiple feature spaces or multiple similarity measures. Our experimental results illustrate multiple advantages of the proposed hybrid method when both content and connection structure information is available in the data for obtaining higher quality clustering results and discovery of new information such as unknown link prediction.
This is a preview of subscription content, access via your institution.




References
Bertsekas, D.: Nonlinear Programming. Athena Scientific, Belmont (1999)
Chang, J., Blei, D.M.: Hierarchical relational models for document networks. Ann. Appl. Stat. 4(1), 124–150 (2010)
Choo, J., Lee, C., Reddy, C.K., Park, H.: Utopian: user-driven topic modeling based on interactive nonnegative matrix factorization. IEEE Trans. Vis. Comput. Graph. 19(12), 1992–2001 (2013). doi:10.1109/TVCG.2013.212
Cohn, D.A., Hofmann, T.: The missing link–a probabilistic model of document content and hypertext connectivity. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems, vol. 13, pp. 430–436. MIT Press, Cambridge (2001)
Cruz, J., Bothorel, C., Poulet, F.: Entropy based community detection in augmented social networks. In: 2011 International Conference on Computational Aspects of Social Networks (CASoN), pp. 163–168 (2011). doi:10.1109/CASON.2011.6085937
Drake, B., Kim, J., Mallick, M., Park, H.: Supervised Raman spectra estimation based on nonnegative rank deficient least squares. In: Proceedings 13th International Conference on Information Fusion, Edinburgh, UK (2010)
Drake, B., Lee-Urban, S., Park, H.: Smallk is a C++/Python high-performance software library for nonnegative matrix factorization (nmf) and hierarchical and flat clustering using the nmf; current version 1.6.2. http://smallk.github.io/ (2017)
Elhadi, H., Agam, G.: Structure and attributes community detection: comparative analysis of composite, ensemble and selection methods. In: Proceedings of the 7th Workshop on Social Network Mining and Analysis, SNAKDD ’13, pp. 10:1–10:7. ACM, New York, NY, USA (2013). doi:10.1145/2501025.2501034
Erosheva, E., Fienberg, S., Lafferty, J.: Mixed-membership models of scientific publications. Proc. Natl. Acad. Sci. 101(suppl 1), 5220–5227 (2004). doi:10.1073/pnas.0307760101
Gruber, A., Rosen-Zvi, M., Weiss, Y.: Latent topic models for hypertext. In: Proceedings of the Twenty-Fourth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-08), pp. 230–239. AUAI Press, Corvallis, Oregon (2008)
Jin, D., Gabrys, B., Dang, J.: Combined node and link partitions method for finding overlapping communities in complex networks. Scientific Reports 5 (2015). doi:10.1038/srep08600
Kannan, R., Ishteva, M., Drake, B., Park, H.: Bounded matrix low rank approximation. In: Naik, G.R. (ed.) Non-negative Matrix Factorisation Techniques: Advances in Theory and Applications, pp. 89–118. Berlin Heidelberg, Springer (2016)
Kannan, R., Ishteva, M., Park, H.: Bounded matrix factorization for recommender system. Knowl. Inf. Syst. 39(3), 491–511 (2014)
Kim, J., He, Y., Park, H.: Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework. J. Glob. Optim. 58(2), 285–319 (2014). doi:10.1007/s10898-013-0035-4
Kim, J., Park, H.: Fast nonnegative matrix factorization: an active-set-like method and comparisons. SIAM J. Sci. Comput. 33(6), 3261–3281 (2011)
Kuang, D., Choo, J., Park, H.: Nonnegative matrix factorization for interactive topic modeling and document clustering. In: Celebi, M.E. (ed.) Partitional Clustering Algorithms, pp. 215–243. Springer International Publishing, Berlin (2015). doi:10.1007/978-3-319-09259-1_7
Kuang, D., Park, H.: Fast rank-2 nonnegative matrix factorization for hierarchical document clustering. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 739–747. ACM (2013)
Kuang, D., Park, H., Ding, C.H.: Symmetric nonnegative matrix factorization for graph clustering. In: SDM, vol. 12, pp. 106–117. SIAM (2012)
Kuang, D., Yun, S., Park, H.: SymNMF: Nonnegative low-rank approximation of a similarity matrix for graph clustering. J. Glob. Optim. 62(3), 545–574 (2015). doi:10.1007/s10898-014-0247-2
Leskovec, J., Krevl, A.: SNAP datasets: stanford large network dataset collection. http://snap.stanford.edu/data (2014)
Liu, J., Wang, C., Gao, J., Han, J.: Multi-view clustering via joint nonnegative matrix factorization. In: Proceedings of the 2013 SIAM International Conference on Data Mining, Proceedings, pp. 252–260. Society for Industrial and Applied Mathematics (2013)
Liu, Y., Niculescu-Mizil, A., Gryc, W.: Topic-link LDA: joint models of topic and author community. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pp. 665–672. ACM, New York, NY, USA (2009). doi:10.1145/1553374.1553460
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: Proceedings of the 17th International Conference on World Wide Web, WWW ‘08, pp. 101–110. ACM, New York, NY, USA (2008). doi:10.1145/1367497.1367512
Nallapati, R.M., Ahmed, A., Xing, E.P., Cohen, W.W.: Joint latent topic models for text and citations. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ‘08, pp. 542–550. ACM, New York, NY, USA (2008). doi:10.1145/1401890.1401957
Ruan, Y., Fuhry, D., Parthasarathy, S.: Efficient community detection in large networks using content and links. In: Proceedings of the 22nd International Conference on World Wide Web, WWW ‘13, pp. 1089–1098. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2013)
Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003). doi:10.1162/153244303321897735
Sun, Y., Aggarwal, C.C., Han, J.: Relation strength-aware clustering of heterogeneous information networks with incomplete attributes. Proc. VLDB Endow. 5(5), 394–405 (2012). doi:10.14778/2140436.2140437
Tang, J., Wang, X., Liu, H.: Integrating social media data for community detection. In: Proceedings of the 2011 International Conference on Modeling and Mining Ubiquitous Social Media, MSM‘11, pp. 1–20. Springer, Berlin, Heidelberg (2012). doi:10.1007/978-3-642-33684-3
Wang, X., Tang, L., Gao, H., Liu, H.: Discovering overlapping groups in social media. In: 2010 IEEE International Conference on Data Mining, pp. 569–578 (2010). doi:10.1109/ICDM.2010.48
Wang, X., Tang, L., Liu, H., Wang, L.: Learning with multi-resolution overlapping communities. Knowl. Inf. Syst. 36(2), 517–535 (2013). doi:10.1007/s10115-012-0555-0
Xu, Y., Yin, W., Wen, Z., Zhang, Y.: An alternating direction algorithm for matrix completion with nonnegative factors. Front. Math. China 7(2), 365–384 (2012). doi:10.1007/s11464-012-0194-5
Yang, J., Leskovec, J.: Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, pp. 587–596. ACM (2013)
Yang, T., Jin, R., Chi, Y., Zhu, S.: Combining link and content for community detection: a discriminative approach. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ‘09, pp. 927–936. ACM, New York, NY, USA (2009). doi:10.1145/1557019.1557120
Zhou, D., Huang, J., Schölkopf, B.: Learning with hypergraphs: clustering, classification, and embedding. In: Schölkopf, B., Platt, J.C., Hoffman, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, pp. 1601–1608. MIT Press, Cambridge (2007)
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported in part by the National Science Foundation (NSF) Grant IIS-1348152 and Defense Advanced Research Projects Agency (DARPA) XDATA program Grant FA8750-12-2-0309. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF or DARPA.
Rights and permissions
About this article
Cite this article
Du, R., Drake, B. & Park, H. Hybrid clustering based on content and connection structure using joint nonnegative matrix factorization. J Glob Optim 74, 861–877 (2019). https://doi.org/10.1007/s10898-017-0578-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-017-0578-x