Neighborhood graphs, stripes and shadow plots for cluster visualization

Abstract

Centroid-based partitioning cluster analysis is a popular method for segmenting data into more homogeneous subgroups. Visualization can help tremendously to understand the positions of these subgroups relative to each other in higher dimensional spaces and to assess the quality of partitions. In this paper we present several improvements on existing cluster displays using neighborhood graphs with edge weights based on cluster separation and convex hulls of inner and outer cluster regions. A new display called shadow-stars can be used to diagnose pairwise cluster separation with respect to the distribution of the original data. Artificial data and two case studies with real data are used to demonstrate the techniques.

This is a preview of subscription content, log in to check access.

References

  1. Becker, R., Cleveland, W., Shyu, M.-J.: The visual design and control of trellis display. J. Comput. Graph. Stat. 5, 123–155 (1996)

    Article  Google Scholar 

  2. Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis, 4th edn. Arnold, London (2001)

    Google Scholar 

  3. Gordon, A.D.: Classification, 2nd edn. Chapman & Hall/CRC, Boca Raton (1999)

    Google Scholar 

  4. Hartigan, J.A., Wong, M.A.: Algorithm AS136: A k-means clustering algorithm. Appl. Stat. 28(1), 100–108 (1979)

    MATH  Article  Google Scholar 

  5. Hennig, C.: Asymmetric linear dimension reduction for classification. J. Comput. Graph. Stat. 13(4), 1–17 (2004)

    MathSciNet  Google Scholar 

  6. Hintze, J.L., Nelson, R.D.: Violin plots: A box plot-density trace synergism. Am. Stat. 52(2), 181–184 (1998)

    Article  Google Scholar 

  7. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data. Wiley, New York (1990)

    Google Scholar 

  8. Kohonen, T.: Self-organization and Associative Memory, 3rd edn. Springer, New York (1989)

    Google Scholar 

  9. Kruskal, J.: The relationship between multidimensional scaling and clustering. In: Ryzin, J.V. (ed.) Classification and Clustering, pp. 17–44. Academic Press, New York (1977)

    Google Scholar 

  10. Leisch, F.: A toolbox for k-centroids cluster analysis. Comput. Stat. Data Anal. 51(2), 526–544 (2006). doi:10.1007/s11222-009-9137-8

    MATH  Article  MathSciNet  Google Scholar 

  11. Leisch, F.: Visualizing cluster analysis and finite mixture models. In: Chen, C., Härdle, W., Unwin, A. (eds.) Handbook of Data Visualization. Springer Handbooks of Computational Statistics. Springer, Berlin (2008). ISBN 978-3-540-33036-3

    Google Scholar 

  12. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Cam, L.M.L., Neyman, J. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press, Berkeley (1967)

    Google Scholar 

  13. Martinetz, T., Schulten, K.: Topology representing networks. Neural Netw. 7(3), 507–522 (1994)

    Article  Google Scholar 

  14. Martinetz, T.M., Berkovich, S.G., Schulten, K.J.: “Neural-Gas” network for vector quantization and its application to time-series prediction. IEEE Trans. Neural Netw. 4(4), 558–569 (1993)

    Article  Google Scholar 

  15. Mazanec, J., Grabler, K., Maier, G.: International City Tourism: Analysis and Strategy. Pinter/Cassel, London (1997)

    Google Scholar 

  16. Pison, G., Struyf, A., Rousseeuw, P.J.: Displaying a clustering with CLUSPLOT. Comput. Stat. Data Anal. 30, 381–392 (1999)

    MATH  Article  Google Scholar 

  17. R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2008). http://www.R-project.org. ISBN 3-900051-07-0

    Google Scholar 

  18. Rousseeuw, P.J.: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    MATH  Article  Google Scholar 

  19. Rousseeuw, P.J., Ruts, I., Tukey, J.W.: The bagplot: A bivariate boxplot. Am. Stat. 53(4), 382–387 (1999)

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Friedrich Leisch.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Leisch, F. Neighborhood graphs, stripes and shadow plots for cluster visualization. Stat Comput 20, 457–469 (2010). https://doi.org/10.1007/s11222-009-9137-8

Download citation

Keywords

  • Cluster analysis
  • Partition
  • Centroid
  • Convex hull
  • R