Abstract
We evaluate various heuristics for hierarchical spectral clustering in large telephone call and Web graphs. Spectral clustering without additional heuristics often produces very uneven cluster sizes or low quality clusters that may consist of several disconnected components, a fact that appears to be common for several data sources but, to our knowledge, no general solution provided so far. Divide-and-Merge, a recently described postfiltering procedure may be used to eliminate bad quality branches in a binary tree hierarchy. We propose an alternate solution that enables k-way cuts in each step by immediately filtering unbalanced or low quality clusters before splitting them further.
Our experiments are performed on graphs with various weight and normalization built based on call detail records and Web crawls. We measure clustering quality both by modularity as well as by the geographic and topical homogeneity of the clusters. Compared to divide-and-merge, we give more homogeneous clusters with a more desirable distribution of the cluster sizes.
Support from a Yahoo Faculty Research Grant and by grant ASTOR NKFP 2/004/05. This work is based on an earlier work: Spectral Clustering in Telephone Call Graphs, in Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, Pages 82–91 (2007) (C) ACM, 2007. http://doi.acm.org/10.1145/1348549.1348559
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aiello, W., Chung, F., Lu, L.: A random graph model for massive graphs. In: Proceedings of the 32nd ACM Symposium on Theory of Computing (STOC), pp. 171–180 (2000)
Alpert, C.J., Kahng, A.B.: Multiway partitioning via geometric embeddings, orderings, and dynamic programming. IEEE Trans. on CAD of Integrated Circuits and Systems 14(11), 1342–1358 (1995)
Alpert, C.J., Kahng, A.B.: Recent directions in netlist partitioning: a survey. Integr. VLSI J. 19(1-2), 1–81 (1995)
Alpert, C.J., Yao, S.-Z.: Spectral partitioning: the more eigenvectors, the better. In: DAC 1995: Proceedings of the 32nd ACM/IEEE conference on Design automation, pp. 195–200. ACM Press, New York (1995)
Au, W.-H., Chan, K.C.C., Yao, X.: A novel evolutionary data mining algorithm with applications to churn prediction. IEEE Trans. Evolutionary Computation 7(6), 532–545 (2003)
Barnes, E.R.: An algorithm for partitioning the nodes of a graph. SIAM Journal on Algebraic and Discrete Methods 3(4), 541–550 (1982)
Benczúr, A.A., Csalogány, K., Kurucz, M., Lukács, A., Lukács, L.: Sociodemographic exploration of telecom communities. In: NSF US-Hungarian Workshop on Large Scale Random Graphs Methods for Modeling Mesoscopic Behavior in Biological and Physical Systems (2006)
Berry, M.W.: SVDPACK: A Fortran-77 software library for the sparse singular value decomposition. Technical report, University of Tennessee, Knoxville, TN, USA (1992)
Boldi, P., Codenotti, B., Santini, M., Vigna, S.: Ubicrawler: A scalable fully distributed web crawler. Software: Practice & Experience 34(8), 721–726 (2004)
Chan, P.K., Schlag, M.D.F., Zien, J.Y.: Spectral k-way ratio-cut partitioning and clustering. In: DAC 1993: Proceedings of the 30th international conference on Design automation, pp. 749–754. ACM Press, New York (1993)
Cheng, D., Kannan, R., Vempala, S., Wang, G.: On a recursive spectral algorithm for clustering from pairwise similarities. Technical report, MIT LCS Technical Report MIT-LCS-TR-906 (2003)
Cheng, D., Vempala, S., Kannan, R., Wang, G.: A divide-and-merge methodology for clustering. In: PODS 2005: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 196–205. ACM Press, New York (2005)
Chung, F., Lu, L.: The average distances in random graphs with given expected degrees. Proceedings of the National Academy of Sciences of the United States of America 99(25), 15879–15882 (2002)
Chung, F., Lu, L., Vu, V.: Eigenvalues of random power law graphs. Annals of Combinatorics (2003)
Chung, F., Lu, L., Vu, V.: Spectra of random graphs with given expected degrees. Proceedings of National Academy of Sciences 100, 6313–6318 (2003)
Cormode, G., Indyk, P., Koudas, N., Muthukrishnan, S.: Fast mining of massive tabular data via approximate distance computations. In: ICDE 2002: Proceedings of the 18th International Conference on Data Engineering, p. 605. IEEE Computer Society, Washington (2002)
Cox, K.C., Eick, S.G., Wills, G.J., Brachman, R.J.: Brief application description; visual data mining: Recognizing telephone calling fraud. Data Min. Knowl. Discov. 1(2), 225–231 (1997)
Derényi, I., Palla, G., Vicsek, T.: Clique percolation in random networks. Physical Review Letters 94, 49–60 (2005)
Ding, C.H.Q., He, X., Zha, H.: A spectral method to separate disconnected and nearly-disconnected web graph components. In: KDD 2001: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 275–280. ACM Press, New York (2001)
Ding, C.H.Q., He, X., Zha, H., Gu, M., Simon, H.D.: A min-max cut algorithm for graph partitioning and data clustering. In: ICDM 2001: Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 107–114. IEEE Computer Society, Washington (2001)
Donath, W.E., Hoffman, A.J.: Lower bounds for the partitioning of graphs. IBM Journal of Research and Development 17(5), 420–425 (1973)
Drineas, P., Frieze, A., Kannan, R., Vempala, S., Vinay, V.: Clustering large graphs via the singular value decomposition. In: Machine Learning, pp. 9–33 (2004)
Fiedler, M.: Algebraic connectivity of graphs. Czechoslovak Mathematical Journal 23(98) (1973)
Frieze, A., Kannan, R., Vempala, S.: Fast Monte-Carlo algorithms for finding low rank approximations. In: Proceedings of the 39th IEEE Symposium on Foundations of Computer Science (FOCS), pp. 370–378 (1998)
Gorny, E.: Russian livejournal: National specifics in the development of a virtual community. pdf online (May 2004)
Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Web content categorization using link information. Technical report, Stanford University (2006–2007)
Hagen, L.W., Kahng, A.B.: New spectral methods for ratio cut partitioning and clustering. IEEE Trans. on CAD of Integrated Circuits and Systems 11(9), 1074–1085 (1992)
Kannan, R., Vempala, S., Vetta, A.: On clusterings — good, bad and spectral. In: IEEE: 2000: ASF, pp. 367–377 (2000)
Karypis, G.: CLUTO: A clustering toolkit, release 2.1. Technical Report 02-017, University of Minnesota, Department of Computer Science (2002)
Kumar, R., Novak, J., Raghavan, P., Tomkins, A.: Structure and evolution of blogspace. Commun. ACM 47(12), 35–39 (2004)
Lang, K.: Finding good nearly balanced cuts in power law graphs. Technical report, Yahoo! Inc. (2004)
Lang, K.: Fixing two weaknesses of the spectral method. In: NIPS 2005: Advances in Neural Information Processing Systems, vol. 18, Vancouver, Canada (2005)
Malik, J., Belongie, S., Leung, T., Shi, J.: Contour and texture analysis for image segmentation. Int. J. Comput. Vision 43(1), 7–27 (2001)
Meila, M., Shi, J.: A random walks view of spectral segmentation. In: AISTATS (2001)
Nanavati, A.A., Gurumurthy, S., Das, G., Chakraborty, D., Dasgupta, K., Mukherjea, S., Joshi, A.: On the structural properties of massive telecom graphs: Findings and implications. In: CIKM (2006)
Onnela, J.P., Saramaki, J., Hyvonen, J., Szabo, G., Lazer, D., Kaski, K., Kertesz, J., Barabasi, A.L.: Structure and tie strengths in mobile communication networks (October 2006)
Open Directory Project (ODP), http://www.dmoz.org
Richardson, M., Domingos, P.: Mining knowledge-sharing sites for viral marketing. In: KDD 2002: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 61–70. ACM Press, New York (2002)
Sarlós, T.: Improved approximation algorithms for large matrices via random projections. In: Proceedings of the 47th IEEE Symposium on Foundations of Computer Science (FOCS) (2006)
Sarlós, T., Benczúr, A.A., Csalogány, K., Fogaras, D., Rácz, B.: To randomize or not to randomize: Space optimal summaries for hyperlink analysis. In: Proceedings of the 15th International World Wide Web Conference (WWW), pp. 297–306 (2006)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) (2000)
Shiga, M., Takigawa, I., Mamitsuka, H.: A spectral clustering approach to optimally combining numerical vectors with a modular network. In: KDD 2007: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 647–656. ACM Press, New York (2007)
von Luxburg, U., Bousquet, O., Belkin, M.: Limits of spectral clustering, pp. 857–864. MIT Press, Cambridge (2005)
Wei, C.-P., Chiu, I.-T.: Turning telecommunications call details to churn prediction: a data mining approach. Expert Syst. Appl. 23(2), 103–112 (2002)
Weiss, Y.: Segmentation using eigenvectors: A unifying view. In: ICCV (2), pp. 975–982 (1999)
Wills, G.J.: NicheWorks — interactive visualization of very large graphs. Journal of Computational and Graphical Statistics 8(2), 190–212 (1999)
Zakharov, P.: Structure of livejournal social network. In: Proceedings of SPIE, vol. 6601, Noise and Stochastics in Complex Systems and Finance (2007)
Zha, H., He, X., Ding, C.H.Q., Gu, M., Simon, H.D.: Spectral relaxation for k-means clustering. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) NIPS, pp. 1057–1064. MIT Press, Cambridge (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kurucz, M., Benczúr, A.A., Csalogány, K., Lukács, L. (2009). Spectral Clustering in Social Networks. In: Zhang, H., et al. Advances in Web Mining and Web Usage Analysis. SNAKDD 2007. Lecture Notes in Computer Science(), vol 5439. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00528-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-00528-2_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00527-5
Online ISBN: 978-3-642-00528-2
eBook Packages: Computer ScienceComputer Science (R0)