Can Embedding Solve Scalability Issues for Mixed-Data Graph Clustering?

  • Nadezhda Fedorova
  • Josep Blat
  • David F. NettletonEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9523)


It is widely accepted that the field of Data Analytics has entered into the era of Big Data. In particular, it has to deal with so-called Big Graph Data, which is the focus of this paper. Graph Data is present in many fields, such as Social Networks, Biological Networks, Computer Networks, and so on. It is recognized that data analysts benefit from interactive real time data exploration techniques such as clustering and zoom capabilities on the clusters. However, although clustering is one of the key aspects of graph data analysis, there is a lack of scalable graph clustering algorithms which would support interactive techniques. This paper presents an approach based on combining graph clustering and graph coordinate system embedding, and which shows promising results through initial experiments. Our approach also incorporates both structural and attribute information, which can lead to a more meaningful clustering.


Graphs Embedding Scalability Clustering 


  1. 1.
    Bastian, M.: Visualize big graph data. Data Visualization Summit, San Francisco, 11–12 April 2013 (2013).
  2. 2.
    Batagelj, V., Brandenburg, F.J., Didimo, W., Liotta, G., Palladino, P., Patrignani, M.: Visual analysis of large graphs using (X, Y)-clustering and hybrid visualizations. IEEE Trans. Vis. Comput. Graph. 17(11), 1587–1598 (2011)CrossRefGoogle Scholar
  3. 3.
    Isenberg, P., Carpendale, S., Hesselmann, T., Isenberg, T., Lee, B. (eds): Research Report N0421. In: Proceedings Workshop on Data Exploration for Interactive Surfaces, DEXIS 2011, pp. 1–47 (2012)Google Scholar
  4. 4.
    Keim, D.A.: Exploring big data using visual analytics. In: Proceedings Workshops of the EDBT/ICDT 2014 Joint Conference (28 March 2014)Google Scholar
  5. 5.
    Keim, D.A., Andrienko, G., Fekete, J.-D., Görg, C., Kohlhammer, J., Melançon, G.: Visual analytics: definition, process, and challenges. In: Kerren, A., Stasko, J.T., Fekete, J.-D., North, C. (eds.) Information Visualization. LNCS, vol. 4950, pp. 154–175. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    Vathy-Fogarassy, A., Abonyi, J.: Graph-Based Clustering and Data Visualization Algorithms. Springer, London (2013). ISBN 978–1-4471-5158-6zbMATHCrossRefGoogle Scholar
  7. 7.
    Cui, W., Zhou, H., Qu, H., Wong, P.C., Li, X.: Geometry-based edge clustering for graph visualization. IEEE Trans. Vis. Comput. Graph. 14(6), 1277–1284 (2008)CrossRefGoogle Scholar
  8. 8.
    Herman, I., Melancon, G., Marshall, M.S.: Graph visualization and navigation in information visualization: a survey. IEEE Trans. Visual Comput. Graph. 6(1), 24–43 (2000)CrossRefGoogle Scholar
  9. 9.
    Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endowment 2(1), 718–729 (2009)CrossRefGoogle Scholar
  10. 10.
    Xu, Z., Ke, Y., Wang, Y., Cheng, H., Cheng, J.: A model-based approach to attributed graph clustering categories and subject descriptors. In: Proceeding ACM SIGMOD International Conference on Management of Data, pp. 505–516 (2012)Google Scholar
  11. 11.
    Zhou, Y., Liu, L.: Social influence based clustering of heterogeneous information networks. In: Proceeding 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2013, pp. 338–346 (2013)Google Scholar
  12. 12.
    Zhou, Y., Cheng, H., Yu, J.X.: Clustering large attributed graphs: an efficient incremental approach. In: Proceeding 2010 IEEE International Conference on Data Mining, pp. 689–698 (2010)Google Scholar
  13. 13.
    Elhadi, H.: Structure and attributes community detection: comparative analysis of composite, ensemble and selection methods. In: Proceeding 7th Workshop on Social Network Mining and Analysis SNAKDD 2013, Article no. 10 (2013)Google Scholar
  14. 14.
    Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retrieval 12(4), 461–486 (2009)CrossRefGoogle Scholar
  15. 15.
    Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979)CrossRefGoogle Scholar
  16. 16.
    Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a k-means clustering algorithm. J. Roy. Stat. Soc.: Ser. C (Appl. Stat.) 28(1), 100–108 (1979)zbMATHGoogle Scholar
  17. 17.
    Zhao, X., Sala, A., Wilson, C., Zheng, H., Zhao, B.Y.: Orion: shortest path estimation for large social graphs. networks, vol. 1 (2010)Google Scholar
  18. 18.
    Zhao, X., Sala, A., Zheng, H., Zhao, B.Y.: Efficient shortest paths on massive social graphs (invited paper). In: Proceedings of 7th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom), pp. 77–86 (2011)Google Scholar
  19. 19.
    Papadimitriou, C.H., Ratajczak, D.: On a conjecture related to geometric routing. Theoret. Comput. Sci. 344(1), 3–14 (2005)zbMATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-mat: a recursive model for graph mining. In: Proceeding SIAM Data Mining Conference, SIAM, Philadelphia, PA (2004)Google Scholar
  21. 21.
    Ley, M.: DBLP — some lessons learned. Proc VLDB Endowment 2(2), 1493–1500 (2009)CrossRefMathSciNetGoogle Scholar
  22. 22.
    Zhao, W., Ma, H., He, Q.: Parallel K-Means clustering based on MapReduce. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) Cloud Computing. LNCS, vol. 5931, pp. 674–679. Springer, Heidelberg (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Nadezhda Fedorova
    • 1
  • Josep Blat
    • 1
  • David F. Nettleton
    • 1
    Email author
  1. 1.Department of Information Technology and CommunicationsUniversitat Pompeu FabraBarcelonaSpain

Personalised recommendations