Abstract
Segmentation results derived using cluster analysis depend on (1) the structure of the data and (2) algorithm parameters. Typically, neither the data structure nor the sensitivity of the analysis to changes in algorithm parameters is assessed in advance of clustering. We propose a benchmarking framework based on bootstrapping techniques that accounts for sample and algorithm randomness. This provides much needed guidance both to data analysts and users of clustering solutions regarding the choice of the final clusters from computations that are exploratory in nature.
This is a preview of subscription content, log in to check access.








References
Aldenderfer, M. S., & Blashfield, R. K. (1984). Cluster analysis. Beverly Hills: Sage.
Brusco, M. J. (2004). Clustering binary data in the presence of masking variables. Psychological Methods, 9(4), 510–523.
Brusco, M. J., Cradit, J. D., & Tashchian, A. (2003). Multicriterion clusterwise regression for joint segmentation settings: An application to customer value. Journal of Marketing Research, 40, 225–234.
Dibb, S., & Simkin, L. (1997). A program for implementing market segmentation. Journal of Business and Industrial Marketing, 12, 51–65.
Dimitriadou, E., Dolnicar, S., & Weingessel, A. (2002). An examination of indexes for determining the number of clusters in binary data sets. Psychometrika, 67(1), 137–160.
Dolnicar, S., & Lazarevski, K. (2009). Methodological reasons for the theory/practice divide in market segmentation. Journal of Marketing Management, 25(3–4), 357–374.
Dolnicar, S., & Leisch, F. (2000). Behavioral market segmentation using the bagged clustering approach based on binary guest survey data: Exploring and visualizing unobserved heterogeneity. Tourism Analysis, 5(2–4), 163–170.
Dubes, R., & Jain, A. K. (1979). Validity studies in clustering methodologies. Pattern Recognition, 11, 235–254.
Dudoit, S., & Fridlyand, J. (2003). Bagging to improve the accuracy of a clustering procedure. Bioinformatics, 19(9), 1090–1099.
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. Monographs on statistics and applied probability. New York: Chapman & Hall.
Evans, J. R., & Berman, B. (1997). Marketing. Englewood Cliffs: Prentice Hall.
Fraley, C., & Raftery, A. E. (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer Journal, 41, 578–588.
Frank, R. E., Massy, W. F., & Wind, Y. (1972). Market segmentation. Englewood Cliffs: Prentice Hall.
Greenberg, M., & McDonald, S. (1989). Successful needs/benefits segmentation: A user’s guide. The Journal of Consumer Marketing, 6, 29.
Hothorn, T., Leisch, F., Zeileis, A., & Hornik, K. (2005). The design and analysis of benchmark experiments. Journal of Computational and Graphical Statistics, 14(3), 675–699.
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data. New York: Wiley.
Kotler, P. (1997). Marketing management: Analysis, planning, implementation and control. Englewood Cliffs: Prentice Hall.
Kotler, P., & Armstrong, G. (2006). Principles of marketing. Upper Saddle River: Prentice Hall.
Leisch, F. (2004). FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software, 11(8), 1–18.
Leisch, F. (2006). A toolbox for k-centroids cluster analysis. Computational Statistics and Data Analysis, 51(2), 526–544.
Martinetz, T., & Schulten, K. (1994). Topology representing networks. Neural Networks, 7(3), 507–522.
Mazanec, J. A., Grabler, K., & Maier, G. (1997). International city tourism: Analysis and strategy. London: Pinter/Cassell.
Milligan, G. W., & Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50(2), 159–179.
Morritt, R. M. (2007). Segmentation strategies for hospitality managers: Target marketing for competitive advantage. Binghamton: Haworth.
Myers, J. H., & Tauber, E. (1977). Market structure analysis. Chicago: American Marketing Association.
R Development Core Team (2008). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0.
Searle, S. R. (1971). Linear models. New York: Wiley.
Silverman, B. W. (1986). Density estimation for statistics and data analysis. Monographs on statistics and applied probability. New York: Chapman & Hall.
Strehl, A., & Gosh, J. (2002). Cluster ensembles—a knowldege reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583–617.
Thorndike, R. L. (1953). Who belongs in the family? Psychometrika, 18, 267–276.
Tibshirani, R., & Walther, G. (2005). Cluster validation by prediction strength. Journal of Computational and Graphical Statistics, 14(3), 511–528.
Titterington, D., Smith, A., & Makov, U. (1985). Statistical analysis of finite mixture distributions. Chichester: Wiley.
Wedel, M., & Boer, P. (2002). Glimmix: A program for estimation of latent class mixture and mixture regression models, version 3.0. Groningen: ProGAMMA.
Wedel, M., & Kamakura, W. A. (1998). Market segmentation—conceptual and methodological foundations. Boston: Kluwer.
Author information
Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dolnicar, S., Leisch, F. Evaluation of structure and reproducibility of cluster solutions using the bootstrap. Mark Lett 21, 83–101 (2010). https://doi.org/10.1007/s11002-009-9083-4
Published:
Issue Date:
Keywords
- Cluster analysis
- Mixture models
- Bootstrap