A Quality Metric for Visualization of Clusters in Graphs
Abstract
Traditionally, graph quality metrics focus on readability, but recent studies show the need for metrics which are more specific to the discovery of patterns in graphs. Cluster analysis is a popular task within graph analysis, yet there is no metric yet explicitly quantifying how well a drawing of a graph represents its cluster structure.
We define a clustering quality metric measuring how well a node-link drawing of a graph represents the clusters contained in the graph. Experiments with deforming graph drawings verify that our metric effectively captures variations in the visual cluster quality of graph drawings. We then use our metric to examine how well different graph drawing algorithms visualize cluster structures in various graphs; the results confirm that some algorithms which have been specifically designed to show cluster structures perform better than other algorithms.
References
- 1.Aldenderfer, M.S., Blashfield, R.: Cluster Analysis. Beverly Hills: Sage Publications, Thousand Oaks (1984)Google Scholar
- 2.Batagelj, V., Mrvar, A.: Pajek data sets (2003). http://pajek.imfm.si/doku.php?id=data:index
- 3.Battista, G.D., Eades, P., Tamassia, R., Tollis, I.G.: Graph Drawing: Algorithms for the Visualization of Graphs. Prentice Hall PTR, Upper Saddle River (1998)Google Scholar
- 4.Baur, M., Benkert, M., Brandes, U., Cornelsen, S., Gaertler, M., Köpf, B., Lerner, J., Wagner, D.: Visone Software for Visual Social Network Analysis. In: Mutzel, P., Jünger, M., Leipert, S. (eds.) GD 2001. LNCS, vol. 2265, pp. 463–464. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45848-4_47CrossRefGoogle Scholar
- 5.Behrisch, M., Blumenschein, M., Kim, N.W., Shao, L., El-Assady, M., Fuchs, J., Seebacher, D., Diehl, A., Brandes, U., Pfister, H., Schreck, T., Weiskopf, D., Keim, D.A.: Quality metrics for information visualization. In: Computer Graphics Forum, vol. 37, pp. 625–662. Wiley Online Library (2018)Google Scholar
- 6.Brandes, U., Pich, C.: Eigensolver methods for progressive multidimensional scaling of large data. In: Kaufmann, M., Wagner, D. (eds.) GD 2006. LNCS, vol. 4372, pp. 42–53. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-70904-6_6CrossRefzbMATHGoogle Scholar
- 7.Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, New York (1991)Google Scholar
- 8.David, A.: Tulip. In: Mutzel, P., Jünger, M., Leipert, S. (eds.) GD 2001. LNCS, vol. 2265, pp. 435–437. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45848-4_34CrossRefGoogle Scholar
- 9.Eades, P., Hong, S.H., Nguyen, A., Klein, K.: Shape-based quality metrics for large graph visualization. J. Graph Algorithms Appl. 21(1), 29–53 (2017)MathSciNetCrossRefGoogle Scholar
- 10.Ellson, J., Gansner, E., Koutsofios, L., North, S.C., Woodhull, G.: Graphviz— open source graph drawing tools. In: Mutzel, P., Jünger, M., Leipert, S. (eds.) GD 2001. LNCS, vol. 2265, pp. 483–484. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45848-4_57CrossRefzbMATHGoogle Scholar
- 11.Estivill-Castro, V.: Why so many clustering algorithms: a position paper. SIGKDD Explor. Newsl. 4(1), 65–75 (2002). https://doi.org/10.1145/568574.568575
- 12.Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78(383), 553–569 (1983). https://doi.org/10.1080/01621459.1983.10478008CrossRefzbMATHGoogle Scholar
- 13.Fruchterman, T.M.J., Reingold, E.M.: Graph drawing by force-directed placement. Softw.: Practice Exp. 21(11), 1129–1164 (1991). https://doi.org/10.1002/spe.4380211102
- 14.Gansner, E.R., Koren, Y., North, S.: Graph drawing by stress majorization. In: Pach, J. (ed.) GD 2004. LNCS, vol. 3383, pp. 239–250. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31843-9_25CrossRefzbMATHGoogle Scholar
- 15.Hachul, S., Jünger, M.: Drawing large graphs with a potential-field-based multilevel algorithm. In: Pach, J. (ed.) GD 2004. LNCS, vol. 3383, pp. 285–295. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31843-9_29CrossRefzbMATHGoogle Scholar
- 16.Hu, Y.: Efficient, high-quality force-directed graph drawing. Math. J. 10(1), 37–71 (2005)Google Scholar
- 17.Huang, W., Hong, S.H., Eades, P.: Effects of crossing angles. In: 2008 IEEE Pacific Visualization Symposium, pp. 41–46. IEEE (2008)Google Scholar
- 18.Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985). https://doi.org/10.1007/BF01908075CrossRefzbMATHGoogle Scholar
- 19.Kobourov, S.G., Pupyrev, S., Saket, B.: Are crossings important for drawing large graphs? In: Duncan, C., Symvonis, A. (eds.) GD 2014. LNCS, vol. 8871, pp. 234–245. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45803-7_20CrossRefzbMATHGoogle Scholar
- 20.Koren, Y.: Drawing graphs by eigenvectors: theory and practice. Comput. Math. Appl. 49(11–12), 1867–1888 (2005). https://doi.org/10.1016/j.camwa.2004.08.015MathSciNetCrossRefzbMATHGoogle Scholar
- 21.Kruiger, J.F.: tsnet (2017). https://github.com/HanKruiger/tsNET/
- 22.Kruiger, J.F., Rauber, P.E., Martins, R.M., Kerren, A., Kobourov, S., Telea, A.C.: Graph layouts by t-SNE. Comput. Graph. Forum 36(3), 283–294 (2017). https://doi.org/10.1111/cgf.13187CrossRefGoogle Scholar
- 23.Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection, June 2014. http://snap.stanford.edu/data
- 24.Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)Google Scholar
- 25.MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)Google Scholar
- 26.Noack, A.: An energy model for visual graph clustering. In: Liotta, G. (ed.) GD 2003. LNCS, vol. 2912, pp. 425–436. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24595-7_40CrossRefzbMATHGoogle Scholar
- 27.Nocaj, A., Ortmann, M., Brandes, U.: Untangling the hairballs of multi-centered, small-world online social media networks. J. Graph Algorithms Appl. 19(2), 595–618 (2015). https://doi.org/10.7155/jgaa.00370MathSciNetCrossRefzbMATHGoogle Scholar
- 28.Ortmann, M., Klimenta, M., Brandes, U.: A sparse stress model. In: Hu, Y., Nöllenburg, M. (eds.) GD 2016. LNCS, vol. 9801, pp. 18–32. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50106-2_2CrossRefGoogle Scholar
- 29.Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)Google Scholar
- 30.Purchase, H.: Which aesthetic has the greatest effect on human understanding? In: DiBattista, G. (ed.) GD 1997. LNCS, vol. 1353, pp. 248–261. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63938-1_67CrossRefGoogle Scholar
- 31.Purchase, H.C., Cohen, R.F., James, M.: Validating graph drawing aesthetics. In: Brandenburg, F.J. (ed.) GD 1995. LNCS, vol. 1027, pp. 435–446. Springer, Heidelberg (1996). https://doi.org/10.1007/BFb0021827CrossRefGoogle Scholar
- 32.Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971). https://doi.org/10.1080/01621459.1971.10482356CrossRefGoogle Scholar
- 33.Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 410–420 (2007)Google Scholar
- 34.Saket, B., Simonetto, P., Kobourov, S.: Group-level graph visualization taxonomy. CoRR abs/1403.7421 (2014)Google Scholar
- 35.Sedlmair, M., Tatu, A., Munzner, T., Tory, M.: A taxonomy of visual cluster separation factors. Comput. Graph. Forum 31(3pt4), 1335–1344 (2012). https://doi.org/10.1111/j.1467-8659.2012.03125.x
- 36.Strehl, A., Ghosh, J.: Cluster ensembles–a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3(Dec), 583–617 (2002)Google Scholar
- 37.Torgerson, W.S.: Multidimensional scaling: I. Theory and method. Psychometrika 17(4), 401–419 (1952). https://doi.org/10.1007/BF02288916
- 38.Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11(Oct), 2837–2854 (2010)Google Scholar
- 39.Wiese, R., Eiglsperger, M., Kaufmann, M.: yfiles - visualization and automatic layout of graphs. In: Jünger, M., Mutzel, P. (eds.) Graph Drawing Software. Mathematics and Visualization, pp. 173–191. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-642-18638-7_8
- 40.Zitnik, M., Sosič, R., Maheshwari, S., Leskovec, J.: BioSNAP Datasets: Stanford biomedical network dataset collection, August 2018. http://snap.stanford.edu/biodata