Cluster Analysis in Marketing Research

Living reference work entry

Latest version View entry history



Cluster analysis is an exploratory tool for compressing data into a smaller number of groups or representing points. The latter aims at sufficiently summarizing the underlying data structure and as such can serve the analyst for further consideration instead of dealing with the complete data set. Because of this data compression property, cluster analysis remains to be an essential part of the marketing analyst’s toolbox in today’s data rich business environment. This chapter gives an overview of the various approaches and methods for cluster analysis and links them with the most relevant marketing research contexts. We also provide pointers to the specific packages and functions for performing cluster analysis using the R ecosystem for statistical computing. A substantial part of this chapter is devoted to the illustration of applying different clustering procedures to a reference data set of shopping basket data. We briefly outline the general approach of the considered techniques, provide a walk-through for the corresponding R code required to perform the analyses, and offer some interpretation of the results.


Cluster analysis Hierarchical clustering k-centroid clustering k-medoid clustering Marketing analysis Marketing research 


  1. Adams, R. A., & Fournier, J. J. (2003). Sobolev spaces (Pure and applied mathematics) (Vol. 140). Amsterdam: Elsevier.Google Scholar
  2. Aldenderfer, M. S., & Blashfield, R. K. (1984). Cluster analysis. Beverly Hills: Sage.CrossRefGoogle Scholar
  3. Anderberg, M. R. (1973). Cluster analysis for applications. New York: Academic.Google Scholar
  4. Arabie, P., & Lawrence, J. H. (1994). Cluster analysis in marketing research. In R. P. Bagozzi (Ed.), Advanced methods of marketing research (pp. 160–189). Cambridge, MA: Blackwell.Google Scholar
  5. Arabie, P., & Lawrence, J. H. (1996). An overview of combinatorial data analysis. Clustering and classification (pp. 5–63). Singapore: World Scientific.Google Scholar
  6. Arabie, P., Carroll, J. D., DeSarbo, W., & Wind, J. (1981). Overlapping clustering: A new method for product positioning. Journal of Marketing Research, 28(3), 310–317.CrossRefGoogle Scholar
  7. Bock, H. H. (1974). Automatische Klassifikation. Göttingen: Vandenhoeck & Ruprecht.Google Scholar
  8. Boztuğ, Y., & Reutterer, T. (2008). A combined approach for segment-specific market basket analysis. European Journal of Operational Research, 187(1), 294–312.CrossRefGoogle Scholar
  9. Breugelmans, E., Boztuğ, Y., & Reutterer, T. (2010). A multistep approach to derive targeted category promotions. Working paper series of the Marketing Science Institute, MSI report no. 10-118, Cambridge, MA.Google Scholar
  10. Büschken, J., & Allenby, G. M. (2016). Sentence-based text analysis for customer reviews. Marketing Science, 35(6), 953–975.CrossRefGoogle Scholar
  11. Cattell, R. B. (1943). The description of personality: Basic traits resolved into clusters. Journal of Abnormal and Social Psychology, 38(4), 476–506.CrossRefGoogle Scholar
  12. Chapman, C., & McDonnell Feit, E. (2019). Segmentation: Clustering and classification. R for marketing research and analytics (pp. 299–338). New York: Springer.Google Scholar
  13. Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(2), 224–227.CrossRefGoogle Scholar
  14. Decker, R. (2005). Market basket analysis by means of a growing neural network. The International Review of Retail, Distribution and Consumer Research, 15(2), 151–169.CrossRefGoogle Scholar
  15. DeSarbo, W. S., Ajay, K. M., & Lalita, A. M. (1993). Non-spatial tree models for the assessment of competitive market structure: An integrated review of the marketing and psychometric literature. In J. Eliashberg & G. L. Lilien (Eds.), Handbooks in operations research and management science (Vol. 5, pp. 193–257). Amsterdam: Elsevier.Google Scholar
  16. Dimitriadou, E., Dolničar, S., & Weingessel, A. (2002). An examination of indexes for determining the number of clusters in binary data sets. Psychometrika, 67(1), 137–159.CrossRefGoogle Scholar
  17. Dolnicar, S., & Leisch, F. (2003). Winter tourist segments in Austria: Identifying stable vacation styles using bagged clustering techniques. Journal of Travel Research, 41(3), 281–292.CrossRefGoogle Scholar
  18. Dolnicar, S., Grün, B., Leisch, F., & Schmidt, K. (2014). Required sample sizes for data-driven market segmentation analyses in tourism. Journal of Travel Research, 53(3), 296–306.CrossRefGoogle Scholar
  19. Dolnicar, S., Grün, B., & Leisch, F. (2018). Market segmentation analysis. Understanding it, doing it, and making it useful. Singapore: Springer.CrossRefGoogle Scholar
  20. Dréze, X., & Hoch, S. J. (1998). Exploiting the installed base using cross-merchandising and category destination programs. International Journal of Research in Marketing, 15(5), 459–471.CrossRefGoogle Scholar
  21. Everitt, B. S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster analysis: Wiley series in probability and statistics. New York: Wiley.CrossRefGoogle Scholar
  22. Farris, J. S. (1969). On the cophenetic correlation coefficient. Systematic Zoology, 18(3), 279–285.CrossRefGoogle Scholar
  23. Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458), 611–631.CrossRefGoogle Scholar
  24. Fraley, C., & Raftery, A. E. (2003). Enhanced model-based clustering, density estimation, and discriminant analysis software: MCLUST. Journal of Classification, 20(2), 263–286.CrossRefGoogle Scholar
  25. Frühwirth-Schnatter, S. (2006). Finite mixture and Markov switching models. New York: Springer Science & Business Media.Google Scholar
  26. Ghesmoune, M., Lebbah, M., & Azzag, H. (2016). State-of-the-art on clustering data streams. Big Data Analytics, 1(13), 1–27.Google Scholar
  27. Grover, R., & Srinivasan, V. (1987). A simultaneous approach to market segmentation and market structuring. Journal of Marketing Research, 24, 139–153.CrossRefGoogle Scholar
  28. Hartigan, J. A. (1975). Clustering algorithms. New York: Wiley.Google Scholar
  29. Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society: Series C: Applied Statistics, 28(1), 100–108.Google Scholar
  30. Hastie, T., Tibshirani, R., & Friedman, J. (2009). Unsupervised learning. In The elements of statistical learning (pp. 485–585). New York: Springer.CrossRefGoogle Scholar
  31. Hennig, C., Meila, M., Murtagh, F., & Rocci, R. (2015). Handbook of cluster analysis. Boca Raton/London/New York: CRC Press.CrossRefGoogle Scholar
  32. Hornik, K. (2004). Cluster ensembles. In C. Weihs, W. Gaul (Eds.), Classification – The ubiquitous challenge. Proceedings of the 28th annual conference of the Gesellschaft für Klassifikation E.V (pp. 65–72). Heidelberg: University of Dortmund/Springer.Google Scholar
  33. Hornik, K. (2005). A clue for cluster ensembles. Journal of Statistical Software, 14(12), 1–25.CrossRefGoogle Scholar
  34. Hruschka, H. (1986). Market definition and segmentation using fuzzy clustering methods. International Journal of Research in Marketing, 3(2), 117–134.CrossRefGoogle Scholar
  35. Hruschka, H., & Natter, M. (1986). Comparing performance of feedforward neural nets and K-means for cluster-based market segmentation. European Journal of Operational Research, 114(2), 346–353.CrossRefGoogle Scholar
  36. Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Upper Saddle River: Prentice-Hall.Google Scholar
  37. Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. Hoboken: Wiley.CrossRefGoogle Scholar
  38. Leisch, F. (2006). A toolbox for k-centroids cluster analysis. Computational Statistics & Data Analysis, 51(2), 526–544.CrossRefGoogle Scholar
  39. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1(14), 281–297.Google Scholar
  40. Manchanda, P., Ansari, A., & Gupta, S. (1999). The “shopping basket”: A model for multicategory purchase incidence decisions. Marketing Science, 18(2), 95–114.CrossRefGoogle Scholar
  41. Mazanec, J. A. (1999). Simultaneous positioning and segmentation analysis with topologically ordered feature maps: A tour operator example. Journal of Retailing and Customer Services, 6(4), 219–235.CrossRefGoogle Scholar
  42. Mazanec, J. A., & Strasser, H. (2000). A nonparametric approach to perceptions-based market segmentation: Foundations (Vol. 1). Wien: Springer.CrossRefGoogle Scholar
  43. McLachlan, G. J., & Basford, K. E. (1988). Mixture models: Inference and applications to clustering. New York: Marcel Dekker.Google Scholar
  44. Mild, A., & Reutterer, T. (2003). An improved collaborative filtering approach for predicting cross-category purchases based on binary market basket data. Journal of Retailing and Consumer Services, 10(3), 123–133.CrossRefGoogle Scholar
  45. Milligan, G. W., & Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50(2), 159–179.CrossRefGoogle Scholar
  46. Mooi, E., Sarstedt, M., & Mooi-Reci, I. (2018). Data. In Market research (pp. 27–50). Singapore: Springer.CrossRefGoogle Scholar
  47. Netzer, O., Feldman, R., Goldenberg, J., & Fresko, M. (2012). Mine your own business: Market-structure surveillance through text mining. Marketing Science, 31(3), 521–543.CrossRefGoogle Scholar
  48. Ng, R. T., & Han, J. (2002). CLARANS: A method for clustering objects for spatial data mining. IEEE Transactions on Knowledge and Data Engineering, 14(5), 1003–1016.CrossRefGoogle Scholar
  49. Punj, G., & Stewart, D. W. (1983). Cluster analysis in marketing research: Review and suggestions for application. Journal of Marketing Research, 20(2), 134–148.CrossRefGoogle Scholar
  50. R Core Team. (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna: R Development Core Team.Google Scholar
  51. Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846850.CrossRefGoogle Scholar
  52. Rao, V. R., & Sabavala, D. J. (1981). Inference of hierarchical choice processes from panel data. Journal of Consumer Research, 8(1), 85–96.CrossRefGoogle Scholar
  53. Reutterer, T. (1998). Competitive market structure and segmentation analysis with self-organizing feature maps. In P. Anderson (Ed.), Proceedings of the 27th EMAC conference. Track 5: Marketing research (pp. 85–105). Stockholm: EMAC.Google Scholar
  54. Reutterer, T. (2003). Bestandsaufnahme und aktuelle Entwicklungen bei der Segmentierungsanalyse von Produktmarkten. Journal für Betriebswirtschaft, 53(2), 52–74.Google Scholar
  55. Reutterer, T., & Natter, M. (2000). Segmentation-based competitive analysis with MULTICLUS and topology representing networks. Computers & Operations Research, 27(11–12), 1227–1247.CrossRefGoogle Scholar
  56. Reutterer, T., Mild, A., Natter, M., & Taudes, A. (2006). A dynamic segmentation approach for targeting and customizing direct marketing campaigns. Journal of Interactive Marketing, 20(3–4), 43–57.CrossRefGoogle Scholar
  57. Reutterer, T., Hahsler, M., & Hornik, K. (2007). Data mining und marketing am beispiel der explorativen warenkorbanalyse. Marketing ZFP, 29(3), 163–180.CrossRefGoogle Scholar
  58. Reutterer, T., Hornik, K., March, N., & Gruber, K. (2017). A data mining framework for targeted category promotions. Journal of Business Economics, 87(3), 337–358.CrossRefGoogle Scholar
  59. Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65.CrossRefGoogle Scholar
  60. Russell, G. J., & Petersen, A. (2000). Analysis of cross category dependence in market basket selection. Journal of Retailing, 76(3), 367–392.CrossRefGoogle Scholar
  61. Russell, G. J., Ratneshwar, S., Schocker, A. D., Bell, D., Bodapat, A., Degeratu, A., Hildebrandt, L., Kim, N., Ramaswami, S., & Shankar, V. H. (1999). Multiple-category decision-making: Review and synthesis. Marketing Letters, 10(3), 319–332.CrossRefGoogle Scholar
  62. Saraçli, S., Doğan, N., & Doğan, I. (2013). Comparison of hierarchical cluster analysis methods by cophenetic correlation. Journal of Inequalities and Applications, 2013(1), 203.CrossRefGoogle Scholar
  63. Sneath, P. H. (1957). Some thoughts on bacterial classification. Journal of General Microbiology, 17, 184–200.CrossRefGoogle Scholar
  64. Sokal, R. R., & Sneath, P. H. A. (1963). Principles of numerical taxonomy (A series of books in biology). San Francisco: W.H. Freeman.Google Scholar
  65. Späth, H. (1977). Cluster-analyse – Algorithmen zur Objektklassifizierung und Datenreduktion (2nd ed.). München/Wien: Oldenbourg Wissenschaftsverlag.Google Scholar
  66. Srivastava, R. K., Leone, R. P., & Shocker, A. D. (1981). Market structure analysis: Hierarchical clustering of products based on substitution-in-use. Journal of Marketing, 45(3), 38–48.CrossRefGoogle Scholar
  67. Srivastava, R. K., Alpert, M. I., & Shocker, A. D. (1984). A customer-oriented approach for determining market structures. Journal of Marketing, 48(2), 32–45.CrossRefGoogle Scholar
  68. Strasser, H. (2000). Reduction of complexity. In J. Mazanec & H. Strasser (Eds.), A nonparametric approach to perceptions-based market segmentation: Foundations (pp. 99–140). Wien/New York: Springer.Google Scholar
  69. Strehl, A., & Ghosh, J. (2003). Relationship-based clustering and visualization for high-dimensional data mining. INFORMS Journal on Computing, 15(2), 208–230.CrossRefGoogle Scholar
  70. Struyf, A., Hubert, M., & Rousseeuw, P. (1996). Clustering in an object-oriented environment. Journal of Statistical Software, 1(4), 1.CrossRefGoogle Scholar
  71. Tirunillai, S., & Tellis, G. J. (2014). Mining marketing meaning from online chatter: Strategic brand analysis of big data using Latent Dirichlet allocation. Journal of Marketing Research, 51(4), 463–479.CrossRefGoogle Scholar
  72. Titterington, D. M., Smith, A. F. M., & Makov, U. E. (1985). Statistical analysis of finite mixture distributions. Chichester: Wiley.Google Scholar
  73. Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.CrossRefGoogle Scholar
  74. Wedel, M., & Kamakura, W. A. (2000). Market segmentation – Conceptual and methodological foundations. New York: Springer.CrossRefGoogle Scholar

Authors and Affiliations

  1. 1.Department of MarketingWU Vienna University of Economics and BusinessViennaAustria
  2. 2.Department of New MediaModul University ViennaViennaAustria

Personalised recommendations