Skip to main content
Log in

Evaluation of structure and reproducibility of cluster solutions using the bootstrap

Marketing Letters Aims and scope Submit manuscript

Abstract

Segmentation results derived using cluster analysis depend on (1) the structure of the data and (2) algorithm parameters. Typically, neither the data structure nor the sensitivity of the analysis to changes in algorithm parameters is assessed in advance of clustering. We propose a benchmarking framework based on bootstrapping techniques that accounts for sample and algorithm randomness. This provides much needed guidance both to data analysts and users of clustering solutions regarding the choice of the final clusters from computations that are exploratory in nature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Canada)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  • Aldenderfer, M. S., & Blashfield, R. K. (1984). Cluster analysis. Beverly Hills: Sage.

    Google Scholar 

  • Brusco, M. J. (2004). Clustering binary data in the presence of masking variables. Psychological Methods, 9(4), 510–523.

    Article  Google Scholar 

  • Brusco, M. J., Cradit, J. D., & Tashchian, A. (2003). Multicriterion clusterwise regression for joint segmentation settings: An application to customer value. Journal of Marketing Research, 40, 225–234.

    Article  Google Scholar 

  • Dibb, S., & Simkin, L. (1997). A program for implementing market segmentation. Journal of Business and Industrial Marketing, 12, 51–65.

    Article  Google Scholar 

  • Dimitriadou, E., Dolnicar, S., & Weingessel, A. (2002). An examination of indexes for determining the number of clusters in binary data sets. Psychometrika, 67(1), 137–160.

    Article  Google Scholar 

  • Dolnicar, S., & Lazarevski, K. (2009). Methodological reasons for the theory/practice divide in market segmentation. Journal of Marketing Management, 25(3–4), 357–374.

    Article  Google Scholar 

  • Dolnicar, S., & Leisch, F. (2000). Behavioral market segmentation using the bagged clustering approach based on binary guest survey data: Exploring and visualizing unobserved heterogeneity. Tourism Analysis, 5(2–4), 163–170.

    Google Scholar 

  • Dubes, R., & Jain, A. K. (1979). Validity studies in clustering methodologies. Pattern Recognition, 11, 235–254.

    Article  Google Scholar 

  • Dudoit, S., & Fridlyand, J. (2003). Bagging to improve the accuracy of a clustering procedure. Bioinformatics, 19(9), 1090–1099.

    Article  Google Scholar 

  • Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. Monographs on statistics and applied probability. New York: Chapman & Hall.

    Google Scholar 

  • Evans, J. R., & Berman, B. (1997). Marketing. Englewood Cliffs: Prentice Hall.

    Google Scholar 

  • Fraley, C., & Raftery, A. E. (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer Journal, 41, 578–588.

    Article  Google Scholar 

  • Frank, R. E., Massy, W. F., & Wind, Y. (1972). Market segmentation. Englewood Cliffs: Prentice Hall.

    Google Scholar 

  • Greenberg, M., & McDonald, S. (1989). Successful needs/benefits segmentation: A user’s guide. The Journal of Consumer Marketing, 6, 29.

    Article  Google Scholar 

  • Hothorn, T., Leisch, F., Zeileis, A., & Hornik, K. (2005). The design and analysis of benchmark experiments. Journal of Computational and Graphical Statistics, 14(3), 675–699.

    Article  Google Scholar 

  • Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.

    Article  Google Scholar 

  • Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data. New York: Wiley.

    Book  Google Scholar 

  • Kotler, P. (1997). Marketing management: Analysis, planning, implementation and control. Englewood Cliffs: Prentice Hall.

    Google Scholar 

  • Kotler, P., & Armstrong, G. (2006). Principles of marketing. Upper Saddle River: Prentice Hall.

    Google Scholar 

  • Leisch, F. (2004). FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software, 11(8), 1–18.

    Google Scholar 

  • Leisch, F. (2006). A toolbox for k-centroids cluster analysis. Computational Statistics and Data Analysis, 51(2), 526–544.

    Article  Google Scholar 

  • Martinetz, T., & Schulten, K. (1994). Topology representing networks. Neural Networks, 7(3), 507–522.

    Article  Google Scholar 

  • Mazanec, J. A., Grabler, K., & Maier, G. (1997). International city tourism: Analysis and strategy. London: Pinter/Cassell.

    Google Scholar 

  • Milligan, G. W., & Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50(2), 159–179.

    Article  Google Scholar 

  • Morritt, R. M. (2007). Segmentation strategies for hospitality managers: Target marketing for competitive advantage. Binghamton: Haworth.

    Google Scholar 

  • Myers, J. H., & Tauber, E. (1977). Market structure analysis. Chicago: American Marketing Association.

    Google Scholar 

  • R Development Core Team (2008). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0.

  • Searle, S. R. (1971). Linear models. New York: Wiley.

    Google Scholar 

  • Silverman, B. W. (1986). Density estimation for statistics and data analysis. Monographs on statistics and applied probability. New York: Chapman & Hall.

    Google Scholar 

  • Strehl, A., & Gosh, J. (2002). Cluster ensembles—a knowldege reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583–617.

    Article  Google Scholar 

  • Thorndike, R. L. (1953). Who belongs in the family? Psychometrika, 18, 267–276.

    Article  Google Scholar 

  • Tibshirani, R., & Walther, G. (2005). Cluster validation by prediction strength. Journal of Computational and Graphical Statistics, 14(3), 511–528.

    Article  Google Scholar 

  • Titterington, D., Smith, A., & Makov, U. (1985). Statistical analysis of finite mixture distributions. Chichester: Wiley.

    Google Scholar 

  • Wedel, M., & Boer, P. (2002). Glimmix: A program for estimation of latent class mixture and mixture regression models, version 3.0. Groningen: ProGAMMA.

    Google Scholar 

  • Wedel, M., & Kamakura, W. A. (1998). Market segmentation—conceptual and methodological foundations. Boston: Kluwer.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Friedrich Leisch.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dolnicar, S., Leisch, F. Evaluation of structure and reproducibility of cluster solutions using the bootstrap. Mark Lett 21, 83–101 (2010). https://doi.org/10.1007/s11002-009-9083-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11002-009-9083-4

Keywords

Navigation