Joint selection of variables and clusters: recovering the underlying structure of marketing data

Brudvig, Susan; Brusco, Michael J.; Cradit, J. Dennis

doi:10.1057/s41270-018-0045-7

Joint selection of variables and clusters: recovering the underlying structure of marketing data

Original Article
Published: 08 November 2018

Volume 7, pages 1–12, (2019)
Cite this article

Journal of Marketing Analytics Aims and scope Submit manuscript

Susan Brudvig¹,
Michael J. Brusco² &
J. Dennis Cradit²

232 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

Clustering observations into groups is perhaps one of the more common marketing analytic techniques. Many variable-selection procedures are available for clustering, and some have exhibited good performance in simulation studies. Unfortunately, the best-performing methods often fail because they emphasize the clustering power of individual variables. For this reason, we recommend extreme caution when using the existing procedures, and we argue that enumeration of all-possible variable subsets is a preferred strategy. We also address a common decision problem—the selection of the number of clusters—and develop an index which can help guide the joint selection of variables and clusters. By way of an empirical example, we illustrate the variable-selection problem and demonstrate the use of the proposed index to jointly select variables and clusters in K-means partitioning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

Notes

The terms partitioning and clustering are often used interchangeably. A partitioning method separates a set of n objects into K nonempty, nonoverlapping, and exhaustive subsets. These K subsets are typically termed clusters or groups. By contrast, clustering methods include partitioning methods, but may also encompass methods that do not directly produce partitions, such as hierarchical clustering, overlapping clustering, and fuzzy clustering methods. In this paper, we limit our focus to partitioning methods, so both partitioning and clustering are valid descriptors of the method.
Marketing applications can easily involve considerably more clustering variables in hyper-dimensional space. However, exhaustive enumeration of all-possible subsets becomes impractical for datasets with a large number of candidate variables. For J > 15, Steinley and Brusco (2008a, b) suggest the replacement of exhaustive enumeration of subsets with a tree-search heuristic. The tree size is controlled by limiting new branches from j to j + 1 variables to the 10 best candidates at each stage. For further discussion of the techniques, see Steinley and Brusco (2008a, b), which subjected the approach to testing on a number of actual and synthetic datasets.
Blockbusters commonly are defined as pharmaceutical products garnering at least one billion dollars in sales annually (Li 2014).
We omit all descriptive details of a full cluster analysis, largely because one of our goals is expository. We note than an important step after interpreting the solutions is to validate them using variables not included in the cluster analysis. For instance, product revenue can be used to establish criterion validity of the APS solution (F_5,281 = 4.06, p < 0.01), with products in Cluster 2 generating significantly more revenue than other clusters. This is consistent with our interpretation of this cluster as identifying potential blockbusters.

References

Ahlawat, H., G. Chierchia, and P. van Arkel. 2014. The secret of successful drug launches. McKinsey & Company report, March. http://www.mckinsey.com/industries/pharmaceuticals-and-medical-products/our-insights/the-secret-of-successful-drug-launches. Accessed 5 Oct 2018.
Arabie, P., and L.J. Hubert. 1994. Cluster analysis in marketing research. In Advanced methods of marketing research, ed. R.P. Bagozzi, 160–189. Oxford: Blackwell.
Google Scholar
Bishop, C.M. 1995. Neural networks for pattern recognition. New York: Oxford University Press.
Google Scholar
Bozdogan, H. 1994. Choosing the number of clusters, subset selection of variables, and outlier detection in the standard mixture-model cluster analysis. In New approaches in classification and data analysis, ed. E. Diday, Y. Lechevallier, M. Schader, P. Bertrand, and B. Burtschy, 169–177. Berlin: Springer.
Chapter Google Scholar
Brusco, M.J., and J.D. Cradit. 2001. A variable-selection heuristic for K-means clustering. Psychometrika 66 (2): 249–270.
Article Google Scholar
Brusco, M.J., R. Singh, J.D. Cradit, and D. Steinley. 2017. Cluster analysis in OM research: Survey and recommendations. International Journal of Operations and Production Management 37 (3): 300–320.
Article Google Scholar
Brusco, M.J., and D. Steinley. 2007. A comparison of heuristic procedures for minimum within-cluster sums of squares partitioning. Psychometrika 72 (4): 583–600.
Article Google Scholar
Caliński, T., and J. Harabasz. 1974. A dendrite method for cluster analysis. Communications in Statistics 3 (1): 1–27.
Google Scholar
Carmone, F.J., A. Kara, and S. Maxwell. 1999. HINoV: A new model to improve market segmentation by identifying noisy variables. Journal of Marketing Research 36 (4): 501–509.
Article Google Scholar
Cook, A.G. 2006. Forecasting for the pharmaceutical industry. Aldershot: Gower Publishing.
Google Scholar
Corstjens, M., E. Demeire, and I. Horowitz. 2005. New-product success in the pharmaceutical industry: How many bites at the cherry? Economics of Innovation and New Technology 14 (4): 319–331.
Article Google Scholar
DeSarbo, W.S., J.D. Carroll, L.A. Clark, and P.E. Green. 1984. Synthesized clustering: A method for amalgamating alternative clustering bases with differential weighting of variables. Psychometrika 49 (1): 57–78.
Article Google Scholar
Dy, J.G., and C.E. Brodley. 2004. Feature selection for unsupervised learning. Journal of Machine Learning Research 5: 845–889.
Google Scholar
Fischer, M., P.S.H. Leeflang, and P.C. Verhoef. 2010. Drivers of peak sales for pharmaceutical brands. Quantitative Marketing and Economics 8 (4): 429–460.
Article Google Scholar
Fowlkes, E.B., and C.L. Mallows. 1983. A method for comparing two hierarchical clusterings. Journal of the American Statistical Association 78 (383): 553–584.
Article Google Scholar
Friedman, J.H., and J.J. Meulman. 2004. Clustering objects on subsets of attributes. Journal of the Royal Statistical Society B 66 (4): 815–849.
Article Google Scholar
Gnanadesikan, R., J.R. Kettenring, and S.L. Tsao. 1995. Weighting and selection of variables for cluster analysis. Journal of Classification 12 (1): 113–136.
Article Google Scholar
Grabowski, H., and J. Vernon. 1990. A new look at the returns and risks to pharmaceutical R&D. Management Science 36 (7): 804–821.
Article Google Scholar
Green, P.E., F.J. Carmone, and J. Kim. 1990. A preliminary study of optimal variable weighting in K-means clustering. Journal of Classification 7 (2): 271–285.
Article Google Scholar
Hair, J.F., W.C. Black, B.J. Babin, and R.E. Anderson. 2014. Multivariate data analysis, 7th ed. Upper Saddle River: Pearson Prentice Hall.
Google Scholar
Han, J., M. Kamber, and J. Pei. 2012. Data mining: Concepts and techniques, 3rd ed. Amsterdam: Elsevier.
Google Scholar
Helsen, K., and P.E. Green. 1991. A computational study of replicated clustering with an application to market segmentation. Decision Sciences 22 (5): 1124–1141.
Article Google Scholar
Henard, D.H., and D.M. Szymanski. 2001. Why some new products are more successful than others. Journal of Marketing Research 38 (3): 362–375.
Article Google Scholar
Hubert, L., and P. Arabie. 1985. Comparing partitions. Journal of Classification 2 (2): 193–218.
Article Google Scholar
Jain, A.K. 2010. Data clustering: 50 years beyond K-means. Pattern Recognition Letters 31 (8): 651–666.
Article Google Scholar
Jain, A.K., M.N. Murty, and P.J. Flynn. 1999. Data clustering: A review. ACM Computing Surveys 31 (3): 264–323.
Article Google Scholar
Jain, P., P. Sharma, and L. Jayaraman. 2014. Behind every good decision: How anyone can use business analytics to turn data into profitable insight. New York: American Management Association.
Google Scholar
Kalyanaram, G., W.T. Robinson, and G.L. Urban. 1995. Order of market entry: Established empirical generalizations, emerging empirical generalizations, and future research. Marketing Science 14 (3): G212–G221.
Article Google Scholar
Kerin, R.A., P.R. Varadarajan, and R.A. Peterson. 1992. First-mover advantage: A synthesis, conceptual framework, and research propositions. Journal of Marketing 56 (4): 33–52.
Article Google Scholar
Kim, S.-S. 2015. Variable selection and outlier detection for automated K-means clustering. Communications for Statistical Applications and Methods 22 (1): 55–67.
Article Google Scholar
Koubaa, Y., R.S. Tabbane, and M. Hamouda. 2017. Segmentation of the senior market: How do different variable sets discriminate between senior segments? Journal of Marketing Analytics 5 (3–4): 99–110.
Article Google Scholar
Law, M.H.C., M.A.T. Figueiredo, and A.K. Jain. 2004. Simultaneous feature selection and clustering using mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (9): 1154–1166.
Article Google Scholar
Li, J.J. 2014. Blockbuster drugs: The rise and decline of the pharmaceutical industry. New York: Oxford University Press.
Google Scholar
Mathwick, C. 2002. Understanding the online consumer: A topology of online relational norms and behavior. Journal of Interactive Marketing 16 (1): 40–55.
Article Google Scholar
Milligan, G.W. 1989. A validation study of a variable-weighting algorithm for cluster analysis. Journal of Classification 6 (1): 53–71.
Article Google Scholar
Milligan, G.W., and M.C. Cooper. 1986. A study of the comparability of external criteria for hierarchical cluster analysis. Multivariate Behavioral Research 21 (4): 441–458.
Article Google Scholar
Montanari, A., and L. Lizzani. 2001. A projection pursuit approach to variable selection. Computational Statistics & Data Analysis 35 (4): 463–473.
Article Google Scholar
Narayanan, S., R. Desiraju, and P.K. Chintagunta. 2004. Return on investment implications for pharmaceutical promotional expenditures: The role of marketing-mix interactions. Journal of Marketing 68 (4): 90–105.
Article Google Scholar
Osinga, E.C., P.S.H. Leeflang, and J.E. Wieringa. 2010. Early marketing matters: a time-varying parameter approach to persistence modeling. Journal of Marketing Research 47 (1): 173–185.
Article Google Scholar
Palazzo, M., A. Vollero, and A. Siano. 2016. Identifying new segments from a global branding perspective: A three-country study. Journal of Marketing Analytics 4 (4): 159–171.
Article Google Scholar
Raftery, A.E., and N. Dean. 2006. Variable selection for model-based clustering. Journal of the American Statistical Association 101 (473): 168–178.
Article Google Scholar
Resney, R., A. Aboshiha, E. Carlisle, and S. Waddell. 2017. Launch for long-term success. Pharmaceutical Executive report, 9 May. http://www.pharmexec.com/launch-long-term-success. Accessed 5 Oct 2018.
Shankar, V., G.S. Carpenter, and L. Krishnamurthi. 1998. Late mover advantage: How innovative late entrants outsell pioneers. Journal of Marketing Research 35 (1): 54–70.
Article Google Scholar
Steinhaus, H. 1956. Sur la division des corps matériels en parties. Bulletin de l’Académie Polonaise des Sciences, Classe III, IV (12): 801–804.
Steinley, D. 2004. Properties of the Hubert-Arabie adjusted Rand index. Psychological Methods 9 (3): 386–396.
Article Google Scholar
Steinley, D. 2006. K-means clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psychology 59 (1): 1–34.
Article Google Scholar
Steinley, D., and M.J. Brusco. 2008a. A new variable weighting and selection procedure for K-means cluster analysis. Multivariate Behavioral Research 43 (1): 77–108.
Article Google Scholar
Steinley, D., and M.J. Brusco. 2008b. Selection of variables in cluster analysis: An empirical comparison of eight procedures. Psychometrika 73 (1): 125–144.
Article Google Scholar
Steinley, D., M.J. Brusco, and L. Hubert. 2016. The variance of the adjusted Rand index. Psychological Methods 21 (2): 261–272.
Article Google Scholar
Urban, G.L., and J.R. Hauser. 1993. Design and marketing of new products. Englewood Cliffs: Prentice-Hall.
Google Scholar
Wedel, M., and W.A. Kamakura. 2000. Market segmentation: Conceptual and methodological foundations, 2nd ed. Dodrecht: Kluwer.
Book Google Scholar
Winegarden, W. 2017. U.S. Pharmaceutical pricing in context. San Francisco: Pacific Research Institute.

Download references

Author information

Authors and Affiliations

School of Business and Economics, Indiana University East, Richmond, IN, 47374, USA
Susan Brudvig
Department of Business Analytics, Information Systems, and Supply Chain, Florida State University, Tallahassee, FL, 32306, USA
Michael J. Brusco & J. Dennis Cradit

Authors

Susan Brudvig
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. Brusco
View author publications
You can also search for this author in PubMed Google Scholar
J. Dennis Cradit
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Susan Brudvig.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brudvig, S., Brusco, M.J. & Cradit, J.D. Joint selection of variables and clusters: recovering the underlying structure of marketing data. J Market Anal 7, 1–12 (2019). https://doi.org/10.1057/s41270-018-0045-7

Download citation

Revised: 24 August 2018
Published: 08 November 2018
Issue Date: 11 March 2019
DOI: https://doi.org/10.1057/s41270-018-0045-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Joint selection of variables and clusters: recovering the underlying structure of marketing data

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Joint selection of variables and clusters: recovering the underlying structure of marketing data

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation