Evaluation of structure and reproducibility of cluster solutions using the bootstrap

Abstract

Segmentation results derived using cluster analysis depend on (1) the structure of the data and (2) algorithm parameters. Typically, neither the data structure nor the sensitivity of the analysis to changes in algorithm parameters is assessed in advance of clustering. We propose a benchmarking framework based on bootstrapping techniques that accounts for sample and algorithm randomness. This provides much needed guidance both to data analysts and users of clustering solutions regarding the choice of the final clusters from computations that are exploratory in nature.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  1. Aldenderfer, M. S., & Blashfield, R. K. (1984). Cluster analysis. Beverly Hills: Sage.

    Google Scholar 

  2. Brusco, M. J. (2004). Clustering binary data in the presence of masking variables. Psychological Methods, 9(4), 510–523.

    Article  Google Scholar 

  3. Brusco, M. J., Cradit, J. D., & Tashchian, A. (2003). Multicriterion clusterwise regression for joint segmentation settings: An application to customer value. Journal of Marketing Research, 40, 225–234.

    Article  Google Scholar 

  4. Dibb, S., & Simkin, L. (1997). A program for implementing market segmentation. Journal of Business and Industrial Marketing, 12, 51–65.

    Article  Google Scholar 

  5. Dimitriadou, E., Dolnicar, S., & Weingessel, A. (2002). An examination of indexes for determining the number of clusters in binary data sets. Psychometrika, 67(1), 137–160.

    Article  Google Scholar 

  6. Dolnicar, S., & Lazarevski, K. (2009). Methodological reasons for the theory/practice divide in market segmentation. Journal of Marketing Management, 25(3–4), 357–374.

    Article  Google Scholar 

  7. Dolnicar, S., & Leisch, F. (2000). Behavioral market segmentation using the bagged clustering approach based on binary guest survey data: Exploring and visualizing unobserved heterogeneity. Tourism Analysis, 5(2–4), 163–170.

    Google Scholar 

  8. Dubes, R., & Jain, A. K. (1979). Validity studies in clustering methodologies. Pattern Recognition, 11, 235–254.

    Article  Google Scholar 

  9. Dudoit, S., & Fridlyand, J. (2003). Bagging to improve the accuracy of a clustering procedure. Bioinformatics, 19(9), 1090–1099.

    Article  Google Scholar 

  10. Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. Monographs on statistics and applied probability. New York: Chapman & Hall.

    Google Scholar 

  11. Evans, J. R., & Berman, B. (1997). Marketing. Englewood Cliffs: Prentice Hall.

    Google Scholar 

  12. Fraley, C., & Raftery, A. E. (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer Journal, 41, 578–588.

    Article  Google Scholar 

  13. Frank, R. E., Massy, W. F., & Wind, Y. (1972). Market segmentation. Englewood Cliffs: Prentice Hall.

    Google Scholar 

  14. Greenberg, M., & McDonald, S. (1989). Successful needs/benefits segmentation: A user’s guide. The Journal of Consumer Marketing, 6, 29.

    Article  Google Scholar 

  15. Hothorn, T., Leisch, F., Zeileis, A., & Hornik, K. (2005). The design and analysis of benchmark experiments. Journal of Computational and Graphical Statistics, 14(3), 675–699.

    Article  Google Scholar 

  16. Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.

    Article  Google Scholar 

  17. Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data. New York: Wiley.

    Google Scholar 

  18. Kotler, P. (1997). Marketing management: Analysis, planning, implementation and control. Englewood Cliffs: Prentice Hall.

    Google Scholar 

  19. Kotler, P., & Armstrong, G. (2006). Principles of marketing. Upper Saddle River: Prentice Hall.

    Google Scholar 

  20. Leisch, F. (2004). FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software, 11(8), 1–18.

    Google Scholar 

  21. Leisch, F. (2006). A toolbox for k-centroids cluster analysis. Computational Statistics and Data Analysis, 51(2), 526–544.

    Article  Google Scholar 

  22. Martinetz, T., & Schulten, K. (1994). Topology representing networks. Neural Networks, 7(3), 507–522.

    Article  Google Scholar 

  23. Mazanec, J. A., Grabler, K., & Maier, G. (1997). International city tourism: Analysis and strategy. London: Pinter/Cassell.

    Google Scholar 

  24. Milligan, G. W., & Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50(2), 159–179.

    Article  Google Scholar 

  25. Morritt, R. M. (2007). Segmentation strategies for hospitality managers: Target marketing for competitive advantage. Binghamton: Haworth.

    Google Scholar 

  26. Myers, J. H., & Tauber, E. (1977). Market structure analysis. Chicago: American Marketing Association.

    Google Scholar 

  27. R Development Core Team (2008). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0.

  28. Searle, S. R. (1971). Linear models. New York: Wiley.

    Google Scholar 

  29. Silverman, B. W. (1986). Density estimation for statistics and data analysis. Monographs on statistics and applied probability. New York: Chapman & Hall.

    Google Scholar 

  30. Strehl, A., & Gosh, J. (2002). Cluster ensembles—a knowldege reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583–617.

    Article  Google Scholar 

  31. Thorndike, R. L. (1953). Who belongs in the family? Psychometrika, 18, 267–276.

    Article  Google Scholar 

  32. Tibshirani, R., & Walther, G. (2005). Cluster validation by prediction strength. Journal of Computational and Graphical Statistics, 14(3), 511–528.

    Article  Google Scholar 

  33. Titterington, D., Smith, A., & Makov, U. (1985). Statistical analysis of finite mixture distributions. Chichester: Wiley.

    Google Scholar 

  34. Wedel, M., & Boer, P. (2002). Glimmix: A program for estimation of latent class mixture and mixture regression models, version 3.0. Groningen: ProGAMMA.

    Google Scholar 

  35. Wedel, M., & Kamakura, W. A. (1998). Market segmentation—conceptual and methodological foundations. Boston: Kluwer.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Friedrich Leisch.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Dolnicar, S., Leisch, F. Evaluation of structure and reproducibility of cluster solutions using the bootstrap. Mark Lett 21, 83–101 (2010). https://doi.org/10.1007/s11002-009-9083-4

Download citation

Keywords

  • Cluster analysis
  • Mixture models
  • Bootstrap