Advertisement

Graph Clustering Based on Attribute-Aware Graph Embedding

  • Esra Akbas
  • Peixiang ZhaoEmail author
Chapter
Part of the Lecture Notes in Social Networks book series (LNSN)

Abstract

Graph clustering is a fundamental problem in graph mining and network analysis. To group vertices of a graph into a series of densely knitted clusters with each cluster being well-separated from all the others, classic methods primarily consider the mere graph structure information in modeling and quantifying the proximity or distance of vertices for graph clustering. However, with the proliferation of rich, heterogeneous attribute information widely available in real-world graphs, such as user profiles in social networks, and GO (Gene Ontology) terms in protein interaction networks, it becomes essential to combine both structure and attribute information of graphs towards yielding better-quality clusters. In this chapter, we propose a new graph embedding approach for attributed graph clustering. We embed each vertex of a graph into a continuous vector space within which the local structure and attribute information surrounding the vertex can be jointly encoded in a unified, latent representation. Specifically, we quantify the vertex-wise attribute proximity into edge weights and leverage a group of truncated, attribute-aware random walks to learn the latent representations of vertices. This way, the challenging attributed graph clustering problem can be cast into the traditional problem of multidimensional data clustering, which has admitted efficient and cost-effective solutions. We apply our attribute-aware graph embedding algorithm in a series of real-world and synthetic attributed graphs and networks. The experimental studies demonstrate that our proposed method significantly outperforms the state-of-the-art attributed graph clustering techniques in terms of both clustering effectiveness and efficiency.

References

  1. 1.
    Akbas, E., Zhao, P.: Attributed graph clustering: an attribute-aware graph embedding approach. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017 (ASONAM’17), pp. 305–308. ACM, New York (2017), http://doi.acm.org/10.1145/3110025.3110092
  2. 2.
    Akoglu, L., Tong, H., Meeder, B., Faloutsos, C.: PICS: parameter-free identification of cohesive subgroups in large attributed graphs. In: Proceedings of the Twelfth SIAM International Conference on Data Mining, Anaheim (SDM’12), pp. 439–450. Society for Industrial and Applied Mathematics, Philadelphia (2012)Google Scholar
  3. 3.
    Andersen, R., Chung, F., Lang, K.: Local graph partitioning using pagerank vectors. In: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), pp. 475–486. IEEE, Piscataway (2006)Google Scholar
  4. 4.
    Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)CrossRefGoogle Scholar
  5. 5.
    Boden, B., Haag, R., Seidl, T.: Detecting and exploring clusters in attributed graphs: a plugin for the gephi platform. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management (CIKM’13), pp. 2505–2508. ACM, New York (2013)Google Scholar
  6. 6.
    Bothorel, C., Cruz, J.D., Magnani, M., Micenkova, B.: Clustering attributed graphs: models, measures and methods. Netw. Sci. 3, 408–444 (2015)CrossRefGoogle Scholar
  7. 7.
    Cannataro, M., Guzzi, P.H., Veltri, P.: Protein-to-protein interactions: technologies, databases, and algorithms. ACM Comput. Surv. 43(1), 1:1–1:36 (2010)CrossRefGoogle Scholar
  8. 8.
    Cao, S., Lu, W., Xu, Q.: Grarep: Learning graph representations with global structural information. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM’15), pp. 891–900. ACM, New York (2015)Google Scholar
  9. 9.
    Dourisboure, Y., Geraci, F., Pellegrini, M.: Extraction and classification of dense communities in the web. In: Proceedings of the 16th International Conference on World Wide Web (WWW’07), pp. 461–470. ACM, New York (2007)Google Scholar
  10. 10.
    Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3–5), 75–174 (2010)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Gong, N.Z., Xu, W., Huang, L., Mittal, P., Stefanov, E., Sekar, V., Song, D.: Evolution of social-attribute networks: measurements, modeling, and implications using Google+. In: Proceedings of the 2012 ACM Conference on Internet Measurement Conference (IMC’12), pp. 131–144. ACM, New York (2012)Google Scholar
  12. 12.
    He, X., Ding, C.H.Q., Zha, H., Simon, H.D.: Automatic topic identification using webpage clustering. In: Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM’01), pp. 195–202. IEEE, Piscataway (2001)Google Scholar
  13. 13.
    Henderson, K., Eliassi-Rad, T., Papadimitriou, S., Faloutsos, C.: HCDF: a hybrid community discovery framework. In: Proceedings of the SIAM International Conference on Data Mining (SDM’10), pp. 754–765. Society for Industrial and Applied Mathematics, Philadelphia (2010)Google Scholar
  14. 14.
    Hu, A.L., Chan, K.C.C.: Utilizing both topological and attribute information for protein complex identification in PPI networks. IEEE/ACM Trans. Comput. Biol. Bioinform. 10(3), 780–792 (2013)CrossRefGoogle Scholar
  15. 15.
    Kim, M., Leskovec, J.: Multiplicative attribute graph model of real-world networks. Internet Math. 8(1–2), 113–160 (2012)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Lattanzi, S., Sivakumar, D.: Affiliation networks. In: Proceedings of the Forty-first Annual ACM Symposium on Theory of Computing (STOC’09), pp. 427–434. ACM, New York (2009)Google Scholar
  17. 17.
    Li, R., Wang, C., Chang, K.C.C.: User profiling in an ego network: co-profiling attributes and relationships. In: Proceedings of the 23rd International Conference on World Wide Web (WWW’14), pp. 819–830. ACM, New York (2014)Google Scholar
  18. 18.
    Liu, L., Xu, L., Wangy, Z., Chen, E.: Community detection based on structure and content: a content propagation perspective. In: 2015 IEEE International Conference on Data Mining, pp. 271–280. IEEE, Piscataway (2015)Google Scholar
  19. 19.
    Macropol, K., Singh, A.: Scalable discovery of best clusters on large graphs. Proc. VLDB Endow. 3(1–2), 693–702 (2010)CrossRefGoogle Scholar
  20. 20.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: 27th Annual Conference on Neural Information Processing Systems (NIPS’13), pp. 3111–3119 (2013)Google Scholar
  21. 21.
    Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems (NIPS’08), pp. 1081–1088 (2008)Google Scholar
  22. 22.
    Perozzi, B., Akoglu, L., Iglesias Sánchez, P., Müller, E.: Focused clustering and outlier detection in large attributed graphs. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14), pp. 1346–1355. ACM, New York (2014)Google Scholar
  23. 23.
    Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14), pp. 701–710. ACM, New York (2014)Google Scholar
  24. 24.
    Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)CrossRefGoogle Scholar
  25. 25.
    Ruan, Y., Fuhry, D., Parthasarathy, S.: Efficient community detection in large networks using content and links. In: Proceedings of the 22nd International Conference on World Wide Web (WWW’13), pp. 1089–1098. ACM, New York (2013)Google Scholar
  26. 26.
    Schaeffer, S.E.: Survey: graph clustering. Comput. Sci. Rev. 1(1), 27–64 (2007)CrossRefGoogle Scholar
  27. 27.
    Steinhaeuser, K., Chawla, N.V.: Identifying and evaluating community structure in complex networks. Pattern Recogn. Lett. 31(5), 413–421 (2010)CrossRefGoogle Scholar
  28. 28.
    Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web (WWW’15), pp. 1067–1077. International World Wide Web Conferences Steering Committee, Geneva (2015)Google Scholar
  29. 29.
    Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)CrossRefGoogle Scholar
  30. 30.
    Xu, Z., Ke, Y., Wang, Y., Cheng, H., Cheng, J.: A model-based approach to attributed graph clustering. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD’12), pp. 505–516. ACM, New York (2012)Google Scholar
  31. 31.
    Xu, Z., Ke, Y., Wang, Y., Cheng, H., Cheng, J.: GBAGC: a general Bayesian framework for attributed graph clustering. ACM Trans. Knowl. Discov. Data 9(1), 5:1–5:43 (2014)CrossRefGoogle Scholar
  32. 32.
    Yang, T., Jin, R., Chi, Y., Zhu, S.: Combining link and content for community detection: a discriminative approach. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09), pp. 927–936. ACM, New York (2009)Google Scholar
  33. 33.
    Zanghi, H., Volant, S., Ambroise, C.: Clustering based on random graph model embedding vertex features. Pattern Recogn. Lett. 31(9), 830–836 (2010)CrossRefGoogle Scholar
  34. 34.
    Zhai, C., Velivelli, A., Yu, B.: A cross-collection mixture model for comparative text mining. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’04), pp. 743–748. ACM, New York (2004)Google Scholar
  35. 35.
    Zhao, X., Chang, A., Sarma, A.D., Zheng, H., Zhao, B.Y.: On the embeddability of random walk distances. Proc. VLDB Endow. 6(14), 1690–1701 (2013)CrossRefGoogle Scholar
  36. 36.
    Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2(1), 718–729 (2009)CrossRefGoogle Scholar
  37. 37.
    Zhou, Y., Cheng, H., Yu, J.X.: Clustering large attributed graphs: an efficient incremental approach. In: Proceedings of the 2010 IEEE International Conference on Data Mining (ICDM’10), pp. 689–698. IEEE, Piscataway (2010)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Oklahoma State UniversityStillwaterUSA
  2. 2.Florida State UniversityTallahasseeUSA

Personalised recommendations