Advertisement

Knowledge and Information Systems

, Volume 24, Issue 3, pp 393–413 | Cite as

A general framework for relation graph clustering

  • Bo LongEmail author
  • Zhongfei Zhang
  • Philip S. Yu
Regular Paper

Abstract

Relation graphs, in which multi-type (or single type) nodes are related to each other, frequently arise in many important applications, such as Web mining, information retrieval, bioinformatics, and epidemiology. In this study, We propose a general framework for clustering on relation graphs. Under this framework, we derive a family of clustering algorithms including both hard and soft versions, which are capable of learning cluster patterns from relation graphs with various structures and statistical properties. A number of classic approaches on special cases of relation graphs, such as traditional graphs with singly-type nodes and bi-type relation graphs with two types of nodes, can be viewed as special cases of the proposed framework. The theoretic analysis and experiments demonstrate the great potential and effectiveness of the proposed framework and algorithm.

Keywords

Relational graph clustering Heterogeneous links Homogeneous links Bregman divergences 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Banerjee A, Dhillon I, Ghosh J, Merugu S, Modha D (2004) A generalized maximum entropy approach to bregman co-clustering and matrix approximation. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 509–514Google Scholar
  2. 2.
    Banerjee A, Merugu S, Dhillon I, Ghosh J (2005) Clustering with Bregman divergences. J Mach Learn Res 6: 1705–1749zbMATHMathSciNetGoogle Scholar
  3. 3.
    Bui T, Jones C (1993) A heuristic for reducing fill in sparse matrix factorization. In: 6th SIAM conference on parallel processing for scientific computing, pp 445–452Google Scholar
  4. 4.
    Chan P, Schlag M, Zien J (1993) Spectral k-way ratio-cut partitioning and clustering. In: Proceedings of the 30th international conference on design automation. ACM, New York, NY, USA, pp 749–754Google Scholar
  5. 5.
    Della Pietra S, Della Pietra V, Lafferty J (2002) Duality and auxiliary functions for Bregman distances. Technical report, Technical Report CMU-CS-01-109, School of Computer Science, Carnegie Mellon University, 2001Google Scholar
  6. 6.
    Dhillon I (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 269–274Google Scholar
  7. 7.
    Dhillon I, Guan Y, Kulis B (2005) A fast kernel-based multilevel algorithm for graph clustering. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining. ACM, New York, NY, USA, pp 629–634Google Scholar
  8. 8.
    Dhillon I, Guan Y, Kulis B (2005) A unified view of kernel k-means, spectral clustering and graph cuts. The University of Texas at Austin, Department of Computer Sciences, Technical Report TR-04Google Scholar
  9. 9.
    Dhillon I, Mallela S, Modha D (2003) Information-theoretic co-clustering. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 89–98Google Scholar
  10. 10.
    Ding C, He X, Zha H, Gu M, Simon H (2001) A min-max cut algorithm for graph partitioning and data clustering. In: Proceedings IEEE international conference on data mining, 2001, ICDM 2001, pp 107–114Google Scholar
  11. 11.
    El-Yaniv R, Souroujon O (2001) Iterative double clustering for unsupervised and semi-supervised learning. In: EMCL’01: Proceedings of the 12th European conference on machine learning. Springer, London, UK, pp 121–132. ISBN:3-540-42536-5Google Scholar
  12. 12.
    Gao B, Liu T, Zheng X, Cheng Q, Ma W (2005) Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining. ACM, New York, NY, USA, pp 41–50Google Scholar
  13. 13.
    Golub G, Van Loan C (1996) Matrix computations. Johns Hopkins University Press, BaltimorezbMATHGoogle Scholar
  14. 14.
    Hendrickson B, Leland R (1995) A multilevel algorithm for partitioning graphs. In: Supercomputing ’95: proceedings of the 1995 ACM/IEEE conference on supercomputing (CDROM). ACM, New York, NY, USA, p 28Google Scholar
  15. 15.
    Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, pp 50–57Google Scholar
  16. 16.
    Hofmann T, Puzicha J (1999), Latent class models for collaborative filtering. In: IJCAI ’99: proceedings of the sixteenth international joint conference on artificial intelligence. Morgan Kaufmann, San Francisco, CA, USA, pp 688–693Google Scholar
  17. 17.
    Karypis G, Kumar V (1999) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1): 359zbMATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Kernighan B, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49(2): 291–307Google Scholar
  19. 19.
    Lang K (1995) Newsweeder: learning to filter netnews. In: Proceedings of the twelfth international conference on machine learning, pp 331–339Google Scholar
  20. 20.
    Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. Adv Neural Inf Process Syst 13: 556–562Google Scholar
  21. 21.
    Li T (2005) A general model for clustering binary data. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining. ACM, New York, NY, USA, pp 188–197Google Scholar
  22. 22.
    Long B, Binghamton S, Wu X, Zhang Z, Yu P (2007) Community learning by graph approximation. In: Proceedings of the 2007 seventh IEEE international conference on data mining. IEEE Computer Society, Washington, DC, USA, pp 232–241Google Scholar
  23. 23.
    Long B, Wu X, Zhang Z, Yu P (2006) Unsupervised learning on k-partite graphs. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 317–326Google Scholar
  24. 24.
    Long B, Zhang Z, Yu P (2005) Co-clustering by block value decomposition. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining. ACM, New York, NY, USA, pp 635–640Google Scholar
  25. 25.
    Long B, Zhang Z, Wu X, Yu P (2006) Spectral clustering for multi-type relational data. In: Proceedings of the 23rd international conference on machine learning. ACM, New York, NY, USA, pp 585–592Google Scholar
  26. 26.
    Neville J, Adler M, Jensen D (2003) Clustering relational data using attribute and link information. In: Proceedings of the IJCAI text mining and link analysis workshopGoogle Scholar
  27. 27.
    Ng A, Jordan M, Weiss Y (2002) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 2: 849–856Google Scholar
  28. 28.
    Salakhutdinov R, Roweis S (2003) Adaptive overrelaxed bound optimization methods. In: Machine learning-international workshop then conference, vol 20, p 664Google Scholar
  29. 29.
    Shental N, Zomet A, Hertz T, Weiss Y (2003) Pairwise clustering and graphical models. Adv Neural Inf Process Syst 16Google Scholar
  30. 30.
    Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8): 888–905CrossRefGoogle Scholar
  31. 31.
    Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3: 583–617zbMATHCrossRefMathSciNetGoogle Scholar
  32. 32.
    Tishby N, Pereira FC, Bialek W (1999) The information bottleneck method. In: Proceedings of the 37th annual allerton conference on communication, control and computing, pp 368–377Google Scholar
  33. 33.
    Van Dongen SM (2000) Graph clustering by flow simulation. PhD thesis, University of UtrechtGoogle Scholar
  34. 34.
    Wang J, Zeng H, Chen Z, Lu H, Tao L, Ma W (2003) ReCoM: reinforcement clustering of multi-type interrelated data objects. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval. ACM, New York, NY, USA, pp 274–281Google Scholar
  35. 35.
    Yu K, Yu S, Tresp V (2006) Soft clustering on graphs. Adv Neural Inf Process Syst 18: 1553Google Scholar
  36. 36.
    Yu SX, Shi J (2003) Multiclass spectral clustering. In: ICCV ’03: Proceedings of the ninth IEEE international conference on computer vision. IEEE Computer Society, Washington, DC, USA, p 313Google Scholar
  37. 37.
    Zeng H-J, Chen Z, Ma W-Y (2002) A unified framework for clustering heterogeneous web objects. In: WISE’02: Proceedings of the 3rd international conference on web information systems engineering. IEEE Computer Society, Washington, DC, USA, pp 161–172. ISBN:0-7695-1766-8Google Scholar
  38. 38.
    Zha H, He X, Ding C, Simon H, Gu M (2001) Bipartite graph partitioning and data clustering. In: Proceedings of the tenth international conference on information and knowledge management. ACM, New York, NY, USA, pp 25–32Google Scholar

Copyright information

© Springer-Verlag London Limited 2009

Authors and Affiliations

  1. 1.Yahoo! LabsSunnyvaleUSA
  2. 2.Computer Science DepartmentSUNY BinghamtonBinghamtonUSA
  3. 3.Department of Computer ScienceUI at ChicagoChicagoUSA

Personalised recommendations