Advertisement

Spectral Clustering in Social Networks

  • Miklós Kurucz
  • András A. Benczúr
  • Károly Csalogány
  • László Lukács
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5439)

Abstract

We evaluate various heuristics for hierarchical spectral clustering in large telephone call and Web graphs. Spectral clustering without additional heuristics often produces very uneven cluster sizes or low quality clusters that may consist of several disconnected components, a fact that appears to be common for several data sources but, to our knowledge, no general solution provided so far. Divide-and-Merge, a recently described postfiltering procedure may be used to eliminate bad quality branches in a binary tree hierarchy. We propose an alternate solution that enables k-way cuts in each step by immediately filtering unbalanced or low quality clusters before splitting them further.

Our experiments are performed on graphs with various weight and normalization built based on call detail records and Web crawls. We measure clustering quality both by modularity as well as by the geographic and topical homogeneity of the clusters. Compared to divide-and-merge, we give more homogeneous clusters with a more desirable distribution of the cluster sizes.

Keywords

spectral clustering telephone call graph social network mining Web graph 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aiello, W., Chung, F., Lu, L.: A random graph model for massive graphs. In: Proceedings of the 32nd ACM Symposium on Theory of Computing (STOC), pp. 171–180 (2000)Google Scholar
  2. 2.
    Alpert, C.J., Kahng, A.B.: Multiway partitioning via geometric embeddings, orderings, and dynamic programming. IEEE Trans. on CAD of Integrated Circuits and Systems 14(11), 1342–1358 (1995)CrossRefGoogle Scholar
  3. 3.
    Alpert, C.J., Kahng, A.B.: Recent directions in netlist partitioning: a survey. Integr. VLSI J. 19(1-2), 1–81 (1995)CrossRefMATHGoogle Scholar
  4. 4.
    Alpert, C.J., Yao, S.-Z.: Spectral partitioning: the more eigenvectors, the better. In: DAC 1995: Proceedings of the 32nd ACM/IEEE conference on Design automation, pp. 195–200. ACM Press, New York (1995)Google Scholar
  5. 5.
    Au, W.-H., Chan, K.C.C., Yao, X.: A novel evolutionary data mining algorithm with applications to churn prediction. IEEE Trans. Evolutionary Computation 7(6), 532–545 (2003)CrossRefGoogle Scholar
  6. 6.
    Barnes, E.R.: An algorithm for partitioning the nodes of a graph. SIAM Journal on Algebraic and Discrete Methods 3(4), 541–550 (1982)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Benczúr, A.A., Csalogány, K., Kurucz, M., Lukács, A., Lukács, L.: Sociodemographic exploration of telecom communities. In: NSF US-Hungarian Workshop on Large Scale Random Graphs Methods for Modeling Mesoscopic Behavior in Biological and Physical Systems (2006)Google Scholar
  8. 8.
    Berry, M.W.: SVDPACK: A Fortran-77 software library for the sparse singular value decomposition. Technical report, University of Tennessee, Knoxville, TN, USA (1992)Google Scholar
  9. 9.
    Boldi, P., Codenotti, B., Santini, M., Vigna, S.: Ubicrawler: A scalable fully distributed web crawler. Software: Practice & Experience 34(8), 721–726 (2004)Google Scholar
  10. 10.
    Chan, P.K., Schlag, M.D.F., Zien, J.Y.: Spectral k-way ratio-cut partitioning and clustering. In: DAC 1993: Proceedings of the 30th international conference on Design automation, pp. 749–754. ACM Press, New York (1993)Google Scholar
  11. 11.
    Cheng, D., Kannan, R., Vempala, S., Wang, G.: On a recursive spectral algorithm for clustering from pairwise similarities. Technical report, MIT LCS Technical Report MIT-LCS-TR-906 (2003)Google Scholar
  12. 12.
    Cheng, D., Vempala, S., Kannan, R., Wang, G.: A divide-and-merge methodology for clustering. In: PODS 2005: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 196–205. ACM Press, New York (2005)CrossRefGoogle Scholar
  13. 13.
    Chung, F., Lu, L.: The average distances in random graphs with given expected degrees. Proceedings of the National Academy of Sciences of the United States of America 99(25), 15879–15882 (2002)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Chung, F., Lu, L., Vu, V.: Eigenvalues of random power law graphs. Annals of Combinatorics (2003)Google Scholar
  15. 15.
    Chung, F., Lu, L., Vu, V.: Spectra of random graphs with given expected degrees. Proceedings of National Academy of Sciences 100, 6313–6318 (2003)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Cormode, G., Indyk, P., Koudas, N., Muthukrishnan, S.: Fast mining of massive tabular data via approximate distance computations. In: ICDE 2002: Proceedings of the 18th International Conference on Data Engineering, p. 605. IEEE Computer Society, Washington (2002)Google Scholar
  17. 17.
    Cox, K.C., Eick, S.G., Wills, G.J., Brachman, R.J.: Brief application description; visual data mining: Recognizing telephone calling fraud. Data Min. Knowl. Discov. 1(2), 225–231 (1997)CrossRefGoogle Scholar
  18. 18.
    Derényi, I., Palla, G., Vicsek, T.: Clique percolation in random networks. Physical Review Letters 94, 49–60 (2005)CrossRefMATHGoogle Scholar
  19. 19.
    Ding, C.H.Q., He, X., Zha, H.: A spectral method to separate disconnected and nearly-disconnected web graph components. In: KDD 2001: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 275–280. ACM Press, New York (2001)Google Scholar
  20. 20.
    Ding, C.H.Q., He, X., Zha, H., Gu, M., Simon, H.D.: A min-max cut algorithm for graph partitioning and data clustering. In: ICDM 2001: Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 107–114. IEEE Computer Society, Washington (2001)CrossRefGoogle Scholar
  21. 21.
    Donath, W.E., Hoffman, A.J.: Lower bounds for the partitioning of graphs. IBM Journal of Research and Development 17(5), 420–425 (1973)MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Drineas, P., Frieze, A., Kannan, R., Vempala, S., Vinay, V.: Clustering large graphs via the singular value decomposition. In: Machine Learning, pp. 9–33 (2004)Google Scholar
  23. 23.
    Fiedler, M.: Algebraic connectivity of graphs. Czechoslovak Mathematical Journal 23(98) (1973)Google Scholar
  24. 24.
    Frieze, A., Kannan, R., Vempala, S.: Fast Monte-Carlo algorithms for finding low rank approximations. In: Proceedings of the 39th IEEE Symposium on Foundations of Computer Science (FOCS), pp. 370–378 (1998)Google Scholar
  25. 25.
    Gorny, E.: Russian livejournal: National specifics in the development of a virtual community. pdf online (May 2004)Google Scholar
  26. 26.
    Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Web content categorization using link information. Technical report, Stanford University (2006–2007)Google Scholar
  27. 27.
    Hagen, L.W., Kahng, A.B.: New spectral methods for ratio cut partitioning and clustering. IEEE Trans. on CAD of Integrated Circuits and Systems 11(9), 1074–1085 (1992)CrossRefGoogle Scholar
  28. 28.
    Kannan, R., Vempala, S., Vetta, A.: On clusterings — good, bad and spectral. In: IEEE: 2000: ASF, pp. 367–377 (2000)Google Scholar
  29. 29.
    Karypis, G.: CLUTO: A clustering toolkit, release 2.1. Technical Report 02-017, University of Minnesota, Department of Computer Science (2002)Google Scholar
  30. 30.
    Kumar, R., Novak, J., Raghavan, P., Tomkins, A.: Structure and evolution of blogspace. Commun. ACM 47(12), 35–39 (2004)CrossRefGoogle Scholar
  31. 31.
    Lang, K.: Finding good nearly balanced cuts in power law graphs. Technical report, Yahoo! Inc. (2004)Google Scholar
  32. 32.
    Lang, K.: Fixing two weaknesses of the spectral method. In: NIPS 2005: Advances in Neural Information Processing Systems, vol. 18, Vancouver, Canada (2005)Google Scholar
  33. 33.
    Malik, J., Belongie, S., Leung, T., Shi, J.: Contour and texture analysis for image segmentation. Int. J. Comput. Vision 43(1), 7–27 (2001)CrossRefMATHGoogle Scholar
  34. 34.
    Meila, M., Shi, J.: A random walks view of spectral segmentation. In: AISTATS (2001)Google Scholar
  35. 35.
    Nanavati, A.A., Gurumurthy, S., Das, G., Chakraborty, D., Dasgupta, K., Mukherjea, S., Joshi, A.: On the structural properties of massive telecom graphs: Findings and implications. In: CIKM (2006)Google Scholar
  36. 36.
    Onnela, J.P., Saramaki, J., Hyvonen, J., Szabo, G., Lazer, D., Kaski, K., Kertesz, J., Barabasi, A.L.: Structure and tie strengths in mobile communication networks (October 2006)Google Scholar
  37. 37.
    Open Directory Project (ODP), http://www.dmoz.org
  38. 38.
    Richardson, M., Domingos, P.: Mining knowledge-sharing sites for viral marketing. In: KDD 2002: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 61–70. ACM Press, New York (2002)Google Scholar
  39. 39.
    Sarlós, T.: Improved approximation algorithms for large matrices via random projections. In: Proceedings of the 47th IEEE Symposium on Foundations of Computer Science (FOCS) (2006)Google Scholar
  40. 40.
    Sarlós, T., Benczúr, A.A., Csalogány, K., Fogaras, D., Rácz, B.: To randomize or not to randomize: Space optimal summaries for hyperlink analysis. In: Proceedings of the 15th International World Wide Web Conference (WWW), pp. 297–306 (2006)Google Scholar
  41. 41.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) (2000)Google Scholar
  42. 42.
    Shiga, M., Takigawa, I., Mamitsuka, H.: A spectral clustering approach to optimally combining numerical vectors with a modular network. In: KDD 2007: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 647–656. ACM Press, New York (2007)Google Scholar
  43. 43.
    von Luxburg, U., Bousquet, O., Belkin, M.: Limits of spectral clustering, pp. 857–864. MIT Press, Cambridge (2005)MATHGoogle Scholar
  44. 44.
    Wei, C.-P., Chiu, I.-T.: Turning telecommunications call details to churn prediction: a data mining approach. Expert Syst. Appl. 23(2), 103–112 (2002)CrossRefGoogle Scholar
  45. 45.
    Weiss, Y.: Segmentation using eigenvectors: A unifying view. In: ICCV (2), pp. 975–982 (1999)Google Scholar
  46. 46.
    Wills, G.J.: NicheWorks — interactive visualization of very large graphs. Journal of Computational and Graphical Statistics 8(2), 190–212 (1999)Google Scholar
  47. 47.
    Zakharov, P.: Structure of livejournal social network. In: Proceedings of SPIE, vol. 6601, Noise and Stochastics in Complex Systems and Finance (2007)Google Scholar
  48. 48.
    Zha, H., He, X., Ding, C.H.Q., Gu, M., Simon, H.D.: Spectral relaxation for k-means clustering. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) NIPS, pp. 1057–1064. MIT Press, Cambridge (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Miklós Kurucz
    • 1
  • András A. Benczúr
    • 1
  • Károly Csalogány
    • 1
  • László Lukács
    • 1
  1. 1.Data Mining and Web search Research Group, Informatics LaboratoryComputer and Automation Research Institute of the Hungarian Academy of SciencesHungary

Personalised recommendations