Advertisement

Data Mining and Knowledge Discovery

, Volume 22, Issue 3, pp 493–521 | Cite as

Community discovery using nonnegative matrix factorization

  • Fei WangEmail author
  • Tao Li
  • Xin Wang
  • Shenghuo Zhu
  • Chris Ding
Article

Abstract

Complex networks exist in a wide range of real world systems, such as social networks, technological networks, and biological networks. During the last decades, many researchers have concentrated on exploring some common things contained in those large networks include the small-world property, power-law degree distributions, and network connectivity. In this paper, we will investigate another important issue, community discovery, in network analysis. We choose Nonnegative Matrix Factorization (NMF) as our tool to find the communities because of its powerful interpretability and close relationship between clustering methods. Targeting different types of networks (undirected, directed and compound), we propose three NMF techniques (Symmetric NMF, Asymmetric NMF and Joint NMF). The correctness and convergence properties of those algorithms are also studied. Finally the experiments on real world networks are presented to show the effectiveness of the proposed methods.

Keywords

Community discovery Nonnegative matrix factorization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Albert R, Jeong H, Barabasi A-L (1999) The diameter of the world wide web. Nature 401: 130CrossRefGoogle Scholar
  2. Amaral LAN, Scala A, Bartheélémy M, Stanley HE (2000) Classes of small-world networks. Proc Natl Acad Sci USA 97: 11149–11152CrossRefGoogle Scholar
  3. Barthelemy M, Amaral LAN (1999) Small-world networks: evidence for a crossover picture. Phys Rev Lett 82: 3180CrossRefGoogle Scholar
  4. Brunet JP, Tamayo P, Golub TR, Mesirov JP (2004) Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci USA 102(12): 4164–4169CrossRefGoogle Scholar
  5. Chen G, Wang F, Zhang C (2007) Collaborative filtering using orthogonal nonnegative matrix tri-factorization. In: ICDM workshops on high performance computing, pp 303–308Google Scholar
  6. Ding C, He X, Simon HD (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of SIAM international conference on data mining, pp 606–610Google Scholar
  7. Ding C, Li T, Jordan MI (2006a) Convex and semi-nonnegative matrix factorizations. LBNL Tech Report 60428Google Scholar
  8. Ding C, Li T, Peng W, Park H (2006b) Orthogonal nonnegative matrix tri-factorizations for clustering. In: SIGKDD, pp 126–135Google Scholar
  9. Ding C, Li T, Peng W (2008) On the equivalence between nonnegative matrix factorization and probabilistic latent semantic indexing. Comput Stat Data Anal 52(1): 155–173MathSciNetGoogle Scholar
  10. Faloutsos M, Faloutsos P, Faloutsos C (1999) On power-law relationships of the internet topology. In: SIGCOMM, pp 251–262Google Scholar
  11. Flake GW, Lawrence S, Giles CL (2000) Efficient identification of web communities. In: SIGKDD, pp 150–160Google Scholar
  12. Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99(12): 7821–7826CrossRefzbMATHMathSciNetGoogle Scholar
  13. Hofmann T, Puzicha J (1999) Latent class models for collaborative filtering. In: IJCAI, pp 688–693Google Scholar
  14. Ino H, Kudo M, Nakamura A (2005) Partitioning of web graphs by community topology. In: WWW, pp 661–669Google Scholar
  15. Lee DD, Seung HS (1999) Learning the parts of objects by nonnegative matrix factorization. Nature 401: 788–791CrossRefGoogle Scholar
  16. Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization, pp 556–562Google Scholar
  17. Long B, Zhang Z, Wu X, Yu PS (2007) Relational clustering by symmetric convex coding. In: Proceedings of the 24th international conference on machine learning, pp 569–576Google Scholar
  18. Miao G, Song Y, Zhang D, Bai H (2008) Parallel spectral clustering algorithm for large-scale community data mining. In: The 17th WWW workshop on social web search and mining (SWSM)Google Scholar
  19. Newman MEJ (2004a) Detecting community structure in networks. Eur Phys J B 38: 321–330CrossRefGoogle Scholar
  20. Newman MEJ (2004b) Fast algorithm for detecting community structure in very large networks. Phys Rev E 69Google Scholar
  21. Paatero P, Tapper U (1994) Positive matrix factorization: a nonnegative factor model with optimal utilization of error estimates of data values. Environmetrics 5: 111–126CrossRefGoogle Scholar
  22. Palla G, Derényi I, Farkas I, Vicsek T. Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043):814–818Google Scholar
  23. Pauca VP, Shahnaz F, Berry MW, Plemmons RJ (2004) Positive matrix factorization: a nonnegative factor model with optimal utilization of error estimates of data values. In: The 4th SIAM international conference on data mining, pp 452–456Google Scholar
  24. Pennock DM, Horvitz E, Lawrence S, Giles CL (2000) Collaborative filtering by personality diagnosis: a hybrid memory- and model-based approach. In: UAIGoogle Scholar
  25. Priebe CE, Conroy JM, Marchette DJ, Park Y (2005) Scan statistics on Enron graphs. In: Proceedings of SIAM international conference on data miningGoogle Scholar
  26. Resnick P., Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) Grouplens: an open architecture for collaborative filtering of netnews, pp 175–186Google Scholar
  27. Ruan J, Zhang W (2007) An efficient spectral algorithm for network community discovery and its applications to biological and social networks. In: ICDMGoogle Scholar
  28. Scott J (2000) Social network analysis: a handbook, 2nd edn. Sage Publications, LondonGoogle Scholar
  29. Sharan R et al (2005) From the cover: conserved patterns of protein interaction in multiple species. Proc Natl Acad Sci USA 102(6): 1974–1979CrossRefGoogle Scholar
  30. Strehl A, Ghosh J, Cardie C (2002) Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3: 583–617CrossRefGoogle Scholar
  31. von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4): 395–416CrossRefMathSciNetGoogle Scholar
  32. Wang F, Ma S, Yang L, Li T (2006) Recommendation on item graphs. In: ICDM, pp 1119–1123Google Scholar
  33. Wang D, Li T, Zhu S, Ding CHQ (2008a) Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, pp 307–314Google Scholar
  34. Wang F, Li T, Zhang C (2008b) Semi-supervised clustering via matrix factorization. In: The 8th SIAM international conference on data miningGoogle Scholar
  35. Wasserman S, Faust K (1994) Social network analysis. Cambridge University Press, CambridgeGoogle Scholar
  36. Watts DJ, Strogatz SH (1998) Collective dynamics of ’small-world’ networks. Nature 393: 440–442CrossRefGoogle Scholar
  37. Weiss Y (1999) Segmentation using eigenvectors: a unifying view. In: ICCV, pp 975–982Google Scholar
  38. Xie YL, Hopke PK, Paatero P (1999) Positive matrix factorization applied to a curve resolution problem. J Chemometr 12(6): 357–364CrossRefGoogle Scholar
  39. Yu K, Tresp V (2005) Learning to learn and collaborative filtering. In: NIPS workshop on inductive transfer: 10 years laterGoogle Scholar
  40. Zhou D, Huang J, Schölkopf B (2005) Learning from labeled and unlabeled data on a directed graph. In: Proceedings of the 22nd international conference on machine learning, pp 1036–1043Google Scholar
  41. Zhang H, Giles CL, Foley HC, Yen J (2007) Probabilistic community discovery using hierarchical latent gaussian mixture model. In: AAAI, pp 663–668Google Scholar

Copyright information

© The Author(s) 2010

Authors and Affiliations

  • Fei Wang
    • 1
    • 2
    Email author
  • Tao Li
    • 1
  • Xin Wang
    • 1
  • Shenghuo Zhu
    • 3
  • Chris Ding
    • 4
  1. 1.School of Computing and Information SciencesFlorida International UniversityMiamiUSA
  2. 2.IBM T.J. Watson Research LabHawthorneUSA
  3. 3.NEC Research Lab America at CupertinoCupertinoUSA
  4. 4.Department of Computer Science and EngineeringUniversity of Texas at ArlingtonArlingtonUSA

Personalised recommendations