Abstract
Centroid-based partitioning cluster analysis is a popular method for segmenting data into more homogeneous subgroups. Visualization can help tremendously to understand the positions of these subgroups relative to each other in higher dimensional spaces and to assess the quality of partitions. In this paper we present several improvements on existing cluster displays using neighborhood graphs with edge weights based on cluster separation and convex hulls of inner and outer cluster regions. A new display called shadow-stars can be used to diagnose pairwise cluster separation with respect to the distribution of the original data. Artificial data and two case studies with real data are used to demonstrate the techniques.
This is a preview of subscription content, log in to check access.
References
Becker, R., Cleveland, W., Shyu, M.-J.: The visual design and control of trellis display. J. Comput. Graph. Stat. 5, 123–155 (1996)
Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis, 4th edn. Arnold, London (2001)
Gordon, A.D.: Classification, 2nd edn. Chapman & Hall/CRC, Boca Raton (1999)
Hartigan, J.A., Wong, M.A.: Algorithm AS136: A k-means clustering algorithm. Appl. Stat. 28(1), 100–108 (1979)
Hennig, C.: Asymmetric linear dimension reduction for classification. J. Comput. Graph. Stat. 13(4), 1–17 (2004)
Hintze, J.L., Nelson, R.D.: Violin plots: A box plot-density trace synergism. Am. Stat. 52(2), 181–184 (1998)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data. Wiley, New York (1990)
Kohonen, T.: Self-organization and Associative Memory, 3rd edn. Springer, New York (1989)
Kruskal, J.: The relationship between multidimensional scaling and clustering. In: Ryzin, J.V. (ed.) Classification and Clustering, pp. 17–44. Academic Press, New York (1977)
Leisch, F.: A toolbox for k-centroids cluster analysis. Comput. Stat. Data Anal. 51(2), 526–544 (2006). doi:10.1007/s11222-009-9137-8
Leisch, F.: Visualizing cluster analysis and finite mixture models. In: Chen, C., Härdle, W., Unwin, A. (eds.) Handbook of Data Visualization. Springer Handbooks of Computational Statistics. Springer, Berlin (2008). ISBN 978-3-540-33036-3
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Cam, L.M.L., Neyman, J. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press, Berkeley (1967)
Martinetz, T., Schulten, K.: Topology representing networks. Neural Netw. 7(3), 507–522 (1994)
Martinetz, T.M., Berkovich, S.G., Schulten, K.J.: “Neural-Gas” network for vector quantization and its application to time-series prediction. IEEE Trans. Neural Netw. 4(4), 558–569 (1993)
Mazanec, J., Grabler, K., Maier, G.: International City Tourism: Analysis and Strategy. Pinter/Cassel, London (1997)
Pison, G., Struyf, A., Rousseeuw, P.J.: Displaying a clustering with CLUSPLOT. Comput. Stat. Data Anal. 30, 381–392 (1999)
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2008). http://www.R-project.org. ISBN 3-900051-07-0
Rousseeuw, P.J.: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Rousseeuw, P.J., Ruts, I., Tukey, J.W.: The bagplot: A bivariate boxplot. Am. Stat. 53(4), 382–387 (1999)
Author information
Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Leisch, F. Neighborhood graphs, stripes and shadow plots for cluster visualization. Stat Comput 20, 457–469 (2010). https://doi.org/10.1007/s11222-009-9137-8
Received:
Accepted:
Published:
Issue Date:
Keywords
- Cluster analysis
- Partition
- Centroid
- Convex hull
- R