Visualizing Cluster Analysis and Finite Mixture Models

Leisch, Friedrich

doi:10.1007/978-3-540-33037-0_22

Friedrich Leisch²

Part of the book series: Springer Handbooks Comp.Statistics ((SHCS))

15k Accesses
5 Citations

Abstract

Data visualization can greatly enhance our understanding of multivariate data structures, and so it is no surprise that cluster analysis and data visualization often go hand in hand, and that textbooks like Gordon (1999) or Everitt et al. (2001) are full of figures. In particular, hierarchical cluster analysis is almost always accompanied by a dendrogram. Results frompartitioning cluster analysis can be visualized by projecting the data into two-dimensional space or using parallel coordinates. Cluster membership is usually represented by different colors and glyphs, or by dividing clusters into several panels of a trellis display (Becker et al., 1996). In addition, silhouette plots (Rousseeuw, 1987) provide a popular tool for diagnosing the quality of a partition. Some of the popularity of self-organizing feature maps (Kohonen, 1989) with practitioners in various fields can be explained by the fact that the results can be “easily” visualized.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 429.00; Price excludes VAT (USA)

Softcover Book: USD 549.99; Price excludes VAT (USA)

Hardcover Book: USD 549.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Becker, R., Cleveland, W. and Shyu, M.-J. (1996). The visual design and control of trellis display, Journal of Computational and Graphical Statistics 5:123–155.
Article Google Scholar
Everitt, B.S., Landau, S. and Leese, M. (2001). Cluster Analysis, 4th edn, Arnold, London, UK.
Google Scholar
Fraley, C. and Raftery, A.E. (2002). Model-based clustering, discriminant analysis and density estimation, Journal of the American Statistical Association 97:611–631.
Article MATH MathSciNet Google Scholar
Friendly, M. (2000). Visualizing Categorical Data, SAS Press, Cary, NC. ISBN 1-58025-660-0.
Google Scholar
Gordon, A.D. (1999). Classification, 2nd edn, Chapman & Hall / CRC, Boca Raton, FL, USA.
MATH Google Scholar
Hartigan, J.A. (1975). Clustering Algorithms, Wiley, New York.
MATH Google Scholar
Hartigan, J.A. and Kleiner, B. (1984). A mosaic of television ratings, The American Statistician 38(1):32–35.
Article Google Scholar
Hartigan, J.A. and Wong, M.A. (1979). Algorithm AS136: A k-means clustering algorithm, Applied Statistics 28(1):100–108.
Article MATH Google Scholar
Hennig, C. (2004). Asymmetric linear dimension reduction for classification, Journal of Computational and Graphical Statistics 13(4):1–17.
Article MathSciNet Google Scholar
Kaufman, L. and Rousseeuw, P.J. (1990). Finding Groups in Data, Wiley, New York.
Google Scholar
Kohonen, T. (1989). Self-organization and Associative Memory, 3rd edn, Springer, New York.
Google Scholar
Lance, G.N. and Williams, W.T. (1967). A general theory of classification sorting strategies I. hierarchical systems, Computer Journal 9:373–380.
Google Scholar
Leisch, F. (2004). Exploring the structure of mixture model components, in J. Antoch (ed), Compstat 2004 – Proceedings in Computational Statistics, Physica Verlag, Heidelberg, pp. 1405–1412. ISBN 3-7908-1554-3.
Google Scholar
Leisch, F. (2006). A toolbox for k-centroids cluster analysis, Computational Statistics and Data Analysis 51(2):526–544.
Article MathSciNet Google Scholar
Mächler, M., Rousseeuw, P., Struyf, A. and Hubert, M. (2005). cluster: Cluster Analysis. R package version 1.10.0.
Google Scholar
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations., in Cam, L.M.L. and Neyman, J. (eds), Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, CA, pp. 281–297.
Google Scholar
Martinetz, T. and Schulten, K. (1994). Topology representing networks, Neural Networks 7(3):507–522.
Article Google Scholar
Meyer, D., Zeileis, A. and Hornik, K. (2005). vcd: Visualizing Categorical Data. R package version 0.9-5.
Google Scholar
Milligan, G.W. and Cooper, M.C. (1985). An examination of procedures for determining the number of clusters in a data set, Psychometrika 50(2):159–179.
Article Google Scholar
Murrell, P. (2005). R Graphics, Chapman & Hall / CRC, Boca Raton, FL.
Google Scholar
Pison, G., Struyf, A. and Rousseeuw, P.J. (1999). Displaying a clustering with CLUSPLOT, Computational Statistics and Data Analysis 30:381–392.
Article MATH Google Scholar
R Development Core Team (2007). R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org
Google Scholar
Rousseeuw, P.J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics 20:53–65.
Article MATH Google Scholar
Rousseeuw, P.J., Ruts, I. and Tukey, J.W. (1999). The bagplot: A bivariate boxplot, The American Statistician 53(4):382–387.
Article Google Scholar
Tantrum, J., Murua, A. and Stuetzle, W. (2003). Assessment and pruning of hierarchical model based clustering, Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, pp. 197–205. ISBN:1-58113-737-0.
Google Scholar
Warnes, G.R. (2005). gplots: Various R programming tools for plotting data. R package version 2.0.8.
Google Scholar
Wedel, M. and DeSarbo, W.S. (1995). A mixture likelihood approach for generalized linear models, Journal of Classification 12:21–55.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Statistik, Ludwig-Maximilians-Universität, München, Germany
Friedrich Leisch

Authors

Friedrich Leisch
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Leisch, F. (2008). Visualizing Cluster Analysis and Finite Mixture Models. In: Handbook of Data Visualization. Springer Handbooks Comp.Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-33037-0_22

Download citation

DOI: https://doi.org/10.1007/978-3-540-33037-0_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33036-3
Online ISBN: 978-3-540-33037-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics