Abstract
Data visualization can greatly enhance our understanding of multivariate data structures, and so it is no surprise that cluster analysis and data visualization often go hand in hand, and that textbooks like Gordon (1999) or Everitt et al. (2001) are full of figures. In particular, hierarchical cluster analysis is almost always accompanied by a dendrogram. Results frompartitioning cluster analysis can be visualized by projecting the data into two-dimensional space or using parallel coordinates. Cluster membership is usually represented by different colors and glyphs, or by dividing clusters into several panels of a trellis display (Becker et al., 1996). In addition, silhouette plots (Rousseeuw, 1987) provide a popular tool for diagnosing the quality of a partition. Some of the popularity of self-organizing feature maps (Kohonen, 1989) with practitioners in various fields can be explained by the fact that the results can be “easily” visualized.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Becker, R., Cleveland, W. and Shyu, M.-J. (1996). The visual design and control of trellis display, Journal of Computational and Graphical Statistics 5:123–155.
Everitt, B.S., Landau, S. and Leese, M. (2001). Cluster Analysis, 4th edn, Arnold, London, UK.
Fraley, C. and Raftery, A.E. (2002). Model-based clustering, discriminant analysis and density estimation, Journal of the American Statistical Association 97:611–631.
Friendly, M. (2000). Visualizing Categorical Data, SAS Press, Cary, NC. ISBN 1-58025-660-0.
Gordon, A.D. (1999). Classification, 2nd edn, Chapman & Hall / CRC, Boca Raton, FL, USA.
Hartigan, J.A. (1975). Clustering Algorithms, Wiley, New York.
Hartigan, J.A. and Kleiner, B. (1984). A mosaic of television ratings, The American Statistician 38(1):32–35.
Hartigan, J.A. and Wong, M.A. (1979). Algorithm AS136: A k-means clustering algorithm, Applied Statistics 28(1):100–108.
Hennig, C. (2004). Asymmetric linear dimension reduction for classification, Journal of Computational and Graphical Statistics 13(4):1–17.
Kaufman, L. and Rousseeuw, P.J. (1990). Finding Groups in Data, Wiley, New York.
Kohonen, T. (1989). Self-organization and Associative Memory, 3rd edn, Springer, New York.
Lance, G.N. and Williams, W.T. (1967). A general theory of classification sorting strategies I. hierarchical systems, Computer Journal 9:373–380.
Leisch, F. (2004). Exploring the structure of mixture model components, in J. Antoch (ed), Compstat 2004 – Proceedings in Computational Statistics, Physica Verlag, Heidelberg, pp. 1405–1412. ISBN 3-7908-1554-3.
Leisch, F. (2006). A toolbox for k-centroids cluster analysis, Computational Statistics and Data Analysis 51(2):526–544.
Mächler, M., Rousseeuw, P., Struyf, A. and Hubert, M. (2005). cluster: Cluster Analysis. R package version 1.10.0.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations., in Cam, L.M.L. and Neyman, J. (eds), Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, CA, pp. 281–297.
Martinetz, T. and Schulten, K. (1994). Topology representing networks, Neural Networks 7(3):507–522.
Meyer, D., Zeileis, A. and Hornik, K. (2005). vcd: Visualizing Categorical Data. R package version 0.9-5.
Milligan, G.W. and Cooper, M.C. (1985). An examination of procedures for determining the number of clusters in a data set, Psychometrika 50(2):159–179.
Murrell, P. (2005). R Graphics, Chapman & Hall / CRC, Boca Raton, FL.
Pison, G., Struyf, A. and Rousseeuw, P.J. (1999). Displaying a clustering with CLUSPLOT, Computational Statistics and Data Analysis 30:381–392.
R Development Core Team (2007). R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org
Rousseeuw, P.J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics 20:53–65.
Rousseeuw, P.J., Ruts, I. and Tukey, J.W. (1999). The bagplot: A bivariate boxplot, The American Statistician 53(4):382–387.
Tantrum, J., Murua, A. and Stuetzle, W. (2003). Assessment and pruning of hierarchical model based clustering, Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, pp. 197–205. ISBN:1-58113-737-0.
Warnes, G.R. (2005). gplots: Various R programming tools for plotting data. R package version 2.0.8.
Wedel, M. and DeSarbo, W.S. (1995). A mixture likelihood approach for generalized linear models, Journal of Classification 12:21–55.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Leisch, F. (2008). Visualizing Cluster Analysis and Finite Mixture Models. In: Handbook of Data Visualization. Springer Handbooks Comp.Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-33037-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-33037-0_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33036-3
Online ISBN: 978-3-540-33037-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)