Abstract
Clustering is one fundamental task in network analysis. A widely-used clustering method is k-means clustering, where clustering is iteratively refined by minimizing the distance between each data point and its cluster center. For k-means clustering, one key issue is initialization, which heavily affects its accuracy and computational cost. This issue is particularly critical when applying k-means clustering to graph data where nodes are not embedded in a metric space. In this paper, we propose to use diversified ranking method to initialize k-means clustering, i.e., finding a set of seed nodes. In diversified ranking, seed nodes are figured out by considering their centrality and diversity in a unified manner. With seed nodes as starting points, k-means clustering is used to cluster nodes into groups. We apply the proposed method to detect communities in synthetic network and real-world network. Results indicate that the proposed method exhibits high effectiveness and efficiency.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Albert, R., Barabási, A.L.: Statistical mechanics of complex networks. J. Reviews of Modern Physics 74(1), 47 (2002)
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. J. Physical Review E 69(2), 026113 (2004)
Shen, H.W., Cheng, X.Q., Guo, J.F.: Exploring the structural regularities in networks. J. Physical Review E 84(5), 056111 (2011)
Gopalan, P.K., Blei, D.M.: Efficient discovery of overlapping communities in massive networks. J. Proceedings of the National Academy if Sciences 110(36), 14534–14539 (2013)
Sun, B.J., Shen, H.W., Cheng, X.Q.: Detecting overlapping communities in massive networks. J. EPL 108(6), 68001 (2014)
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. J. Proceedings of the National Academy if Sciences 99(12), 7821–7826 (2002)
McDaid, A., Hurley, N.: Detecting overlapping communities with model-based overlapping seed expansion. In: 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 112–119. IEEE (2010)
Andersen, R., Lang, K.J.: Communities from seed sets. In: Proceedings of the 15th international Conference on World Wide Web, pp. 223–232. ACM (2006)
Celebi, M.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Systems with Applications. 40, 200–210 (2013)
Arthur, D., Vassilvitskii, S.: k-means++: the advantage of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics, Philadelphia (2007)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing ordering to the web. J. (1999)
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335–336. ACM (1998)
Mei, Q., Guo, J., Radev, D.: Divrank: the interplay of prestige and diversity in information networks. In: Proceedings of the 16th ACM SIGKDD International Conference on on Knowledge Discovery and Data Mining, pp. 1009–1018. ACM (2010)
Tong, H., He, J., Wen, Z., Konuru, R., Lin, C.Y.: Diversified ranking on large graphs: an optimization viewpoint. In: Proceedings of the 17th ACM SIGKDD International Conference on on Knowledge Discovery and Data Mining, pp. 1028–1036. ACM (2011)
Sun, Y., Han, J., Zhao, P.: RankClus: integrating clustering with ranking for heterogeneous information network analysis. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 565–576. ACM (2009)
Sun, Y., Han, J.: Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 797–806. ACM (2009)
Kücüktunc, O., Saule, E., Kaya, K.: Diversifing citation recommendations. J. ACM Transactions on Intelligent System and Technology (TIST) 5(4), 55 (2014)
Li, R.H., Yu, J.X.: Scalable diversified ranking on large graphs. IEEE Transactions on J. Knowledge and Data Engineering 25(9), 2133–2146 (2013)
Cheng, X.Q., Sun, B.J., Shen, H.W., Yu, Z.H.: Research Status and Trends of Diversified Graph Ranking. J. Proceedings of the Chinese Academy of Science 30(2), 248–256 (2015)
Zhai, C.X., Cohen, W.W., Lafferty, J.: Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 10–17 (2003)
Zhai, C.X., Lafferty, J.: A risk minimization framework for information retrieval. Information Processing & Management 42(1), 31–55 (2006)
Lin, H., Bilmes, J., Xie, S.: Graph-based submodular selection for extractive summarization. In: Automatic Speech Recognition and Understanding Workshop (2009)
Zhu, X., Goldberg, A.B., Van Gael, J., Andrzejewski, D.: Improving diversity in ranking using absorbing random walks. In: HLT-NAACL, pp. 97–104 (2007)
Cheng, X.Q., Du, P., Guo, J.: Ranking on data manifold with sink points. IEEE Transactions on J. Knowledge and Data Engineering 25(1), 177–191 (2013)
Agichtein, E., Brill, E., Dumais, S.T., et al.: Learning user interaction models for predicting web search result preferences. In: Proc. of SIGIR, pp. 3–10 (2006)
Lü, L., Zhang, Y.C., Yeung, C.H.: Leaders in social networks, the delicious case. PloS One 6(6), e21202 (2011)
Dhillon, I.S., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 551–556. ACM (2004)
Arfken, G.: Ill-Conditioned Systems. Mathematical Methods for Physicists, 3rd edn, pp. 233–234. Academic Press, Orlando (1985)
Liu, J., Liu, T.: Detecting community structure in complex networks using simulated annealing with k-means algorithms. J. Physica A: Statistical Mechanics and its Applications 389(11), 2300–2309 (2010)
Lancichinetti, A., Radicchi, F., Ramasco, J.J., Fortunato, S.: Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. J. Physical Review E 80(1), 016118 (2009)
Lancichinetti, A., Fortunato, S., Kertész, J.: Detecting the overlapping and hierarchical community structure in complex networks. J. New Journal of Physics 11(3), 033015 (2009)
Blondel, V.D., Guillaume, J.L., Lambiotte, R., et al.: Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008(10), P10008 (2008)
Rosvall, M., Bergstorm, C.T.: Maps of random walks on complex networks reveal community structure. J. Proceedings of the National Academy of Sciences 105(4), 1118–1123 (2008)
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. J. Proc. Natl. Acad. Sci. 99, 7821–7826 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Sun, BJ., Shen, HW., Cheng, XQ. (2015). Improve Network Clustering via Diversified Ranking. In: Thai, M., Nguyen, N., Shen, H. (eds) Computational Social Networks. CSoNet 2015. Lecture Notes in Computer Science(), vol 9197. Springer, Cham. https://doi.org/10.1007/978-3-319-21786-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-21786-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21785-7
Online ISBN: 978-3-319-21786-4
eBook Packages: Computer ScienceComputer Science (R0)