Cluster Analysis

Backhaus, Klaus; Erichson, Bernd; Gensler, Sonja; Weiber, Rolf; Weiber, Thomas

doi:10.1007/978-3-658-32589-3_8

Klaus Backhaus⁶,
Bernd Erichson⁷,
Sonja Gensler⁸,
Rolf Weiber⁹ &
…
Thomas Weiber¹⁰

4466 Accesses
5 Citations

Abstract

Cluster analysis is a procedure for grouping cases (objects of investigation) in a data set. For this purpose, the first step is to determine the similarity or dissimilarity (distance) between the cases by a suitable measure. The second step searches for the fusion algorithm which combines the individual cases successively into groups (clusters). The goal is to combine such cases into groups which are similar with respect to the considered segmentation variables (homogenous groups). At the same time, the groups should be as dissimilar as possible. The procedures of cluster analysis can handle variables with metric, non-metric as well as mixed scales. The focus of the chapter is on hierarchical agglomerative clustering methods, with the single-linkage method and Ward’s method presented in detail. Finally, k-means clustering and two-step cluster analysis, two partitioning cluster methods, are also explained. These methods offer particular advantages when working with large amounts of data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In diagram A the two characteristics “income” and “age” are not independent. This means that the two-cluster solution could have been achieved on the basis of only one of the two characteristics. On the independence of cluster variables, see Sect. 8.2.1.
2.
The selection of the proximity dimensions shown in Table 8.4 is based on the proximity measurements provided in the SPSS procedure “Hierarchical Cluster Analysis”.
3.
On the website www.multivariate-methods.info, we provide supplementary material (e.g., Excel files) to deepen the reader’s understanding of the methodology.
4.
To simplify the following calculations, only integer values were included in the initial data matrix.
5.
On the standardization of variables, see the comments on statistical basics in Sect. 1.2.1.
6.
A detailed description of the calculation of the correlation coefficient may be found in Sect. 1.2.2.
7.
Due to their rather minor practical importance, divisive cluster procedures will not be discussed here. If you consider applying a divisive clustering algorithm, you can do this in SPSS by clicking on ‘Analyze/Classify/Tree‘.
8.
The course of a fusion process is usually illustrated by a table (so-called agglomeration schedule) and by a dendrogram or icicle diagrams. Both options are explained in detail for the single-linkage method in Sect. 8.2.3.2.1.
9.
For the extended example, the dendrograms were created using the procedure CLUSTER in SPSS (see Sect. 8.3.2).
10.
The agglomeration schedule was also created using the procedure CLUSTER in SPSS.
11.
Since there are no criteria available in SPSS for determining the optimal number of clusters, it is recommended to use alternative programs such as S-Plus, R or SAS and the cubic clustering criterion (CCC) if available.
12.
For a brief summary of the basics of statistical testing, see Sect. 1.3.
13.
In addition to KM-CA, two-step cluster analysis may also be used to optimize a clustering solution found by another procedure. Both methods belong to the partitioning clustering methods described in detail in Sect. 8.4.2.
14.
On the problem of outliers, see also the comments in Sect. 1.5.1.
15.
For more detailed considerations on the robustness of cluster analyses, see García-Escudero et al. (2010, p. 89).
16.
For a brief summary of the basics of statistical testing, see Sect. 1.3.
17.
Missing values are a frequent and unfortunately unavoidable problem when conducting surveys (e.g. because people cannot or do not want to answer the question, or as a result of mistakes by the interviewer). The handling of missing values in empirical studies is discussed in Sect. 1.5.2.
18.
The mean values were calculated on the basis of the data set that was also used in the case study of discriminant analysis (Chap. 4), logistic regression (Chap. 5) and factor analysis (Chap. 7). Using the same case study allows us to illustrate the similarities and differences between the methods.
19.
Multinomial logistic regression requires at least three groups. In case of a two-cluster solution a binary logistic regression would have to be performed.

References

Calinski, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in statistics—Theory and methods, 3(1), 1–27.
Google Scholar
García-Escudero, L., Gordaliza, A., Matrán, C., & Mayo-Iscar, A. (2010). A review of robust clustering methods. Advances in Data Analysis and Classification, 4, 89–109.
Article Google Scholar
Kline, R. (2011). Principles and practice of structural equation modeling (3rd ed.). New York: Guilford Press.
Google Scholar
Lance, G. H., & Williams, W.T. (1966). A general theory of classification sorting strategies I. Hierarchical systems. Computer Journal, 9, 373–380.
Google Scholar
Milligan, G. W. (1980). An Examination of the effect of six types of error pertubation on fifteen clustering algorithms. Psychometrika, 45(3), 325–342.
Article Google Scholar
Milligan, G. W., & Cooper, M. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50(2), 159–179.
Article Google Scholar
Mojena, R. (1977). Hierarchical clustering methods and stopping rules: A evaluation. The Computer Journal, 20(4), 359–363.
Article Google Scholar
Punj, G., & Stewart, D. (1983). Cluster analysis in marketing research: Review and suggestions for application. Journal of Marketing Research, 20(2), 134–148.
Article Google Scholar
Wedel, M., & Wagner, A. (2000). Market segmentation: Conceptual and methodological foundations (2nd ed.). Boston: Springer.
Book Google Scholar
Wind, Y. (1978). Issues and Advances in segmentation research. Journal of Marketing Research, 15(3), 317–337.
Article Google Scholar

Author information

Authors and Affiliations

Institute of Business-to-Business Marketing, Marketing Center Münster, University of Münster, Münster, Germany
Klaus Backhaus
Otto-von-Guericke-University Magdeburg, Magdeburg, Germany
Bernd Erichson
Chair for Value-Based-Marketing, Marketing Center Münster, University of Münster, Münster, Germany
Sonja Gensler
Chair of Marketing and Innovation, University of Trier, Trier, Germany
Rolf Weiber
Munich, Germany
Thomas Weiber

Authors

Klaus Backhaus
View author publications
You can also search for this author in PubMed Google Scholar
Bernd Erichson
View author publications
You can also search for this author in PubMed Google Scholar
Sonja Gensler
View author publications
You can also search for this author in PubMed Google Scholar
Rolf Weiber
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Weiber
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Klaus Backhaus .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Backhaus, K., Erichson, B., Gensler, S., Weiber, R., Weiber, T. (2021). Cluster Analysis. In: Multivariate Analysis. Springer Gabler, Wiesbaden. https://doi.org/10.1007/978-3-658-32589-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-658-32589-3_8
Published: 14 October 2021
Publisher Name: Springer Gabler, Wiesbaden
Print ISBN: 978-3-658-32588-6
Online ISBN: 978-3-658-32589-3
eBook Packages: Business and Economics (German Language)

Publish with us

Policies and ethics

Cluster Analysis

Abstract

Access this chapter

Notes

References

Further reading

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Cluster Analysis

Abstract

Access this chapter

Notes

References

Further reading

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation