Abstract
Mixture regression models are an important method for uncovering unobserved heterogeneity. A fundamental challenge in their application relates to the identification of the appropriate number of segments to retain from the data. Prior research has provided several simulation studies that compare the performance of different segment retention criteria. Although collinearity between the predictor variables is a common phenomenon in regression models, its effect on the performance of these criteria has not been analyzed thus far. We address this gap in research by examining the performance of segment retention criteria in mixture regression models characterized by systematically increased collinearity levels. The results have fundamental implications and provide guidance for using mixture regression models in empirical (marketing) studies.
Similar content being viewed by others
Notes
For the mathematical specification of the criteria, see Table A1 in the Online Supplement of this paper.
These factor levels combine Andrews and Currim’s (2003b) two factors “number of individuals” (100 or 300) and “number of observations per individual” (5 or 10).
Note that prior studies used unstandardized mean separations with a random distribution of coefficients, which makes a full replication impossible as detailed information on the specified variances is missing.
The balanced factor level involves equally sized segments, while the unbalanced factor levels characterize the existence of one segment that is considerably larger than the other segments. Specifically, the unbalanced segments exhibit the following relative sizes: 65 %/35 % (unbalanced) and 80 %/20 % (very unbalanced) in a situation with two segments, 50 %/25 %/25 % (unbalanced) and 66.66 %/16.66 %/16.66 % (very unbalanced) in the case of three segments, and 40 %/20 %/20 %/20 % (unbalanced) and 55 %/15 %/15 %/15 % (very unbalanced) in the case of four segments.
For the correlation matrices, see Table A2 in the Online Supplement of this paper.
For an illustration of the difference between consistent and inconsistent correlation matrices between segments, see Table A3 in the Online Supplement of this paper.
Note that the numbers do not always add to 100 % because of rounding inaccuracies. The more precise numbers of 82.38, 11.21, and 6.41 % add to 100 %.
For detailed results, see Table A4 in the Online Supplement of this paper.
We thank an anonymous reviewer for this suggestion.
For the complete table with all criteria’s results, see Table A5 in the Online Supplement of this paper.
For the ANCOVA results, see Table A6 in the Online Supplement of this paper.
For example, Kim et al. (2013) extend the new Bayesian latent structure regression model by Kim et al. (2012) by implementing model constrains and illustrating these in comparative analyses that contrast the performance of the proposed methodology with standard latent class finite mixture regression, as well as with traditional Bayesian finite mixture regression. The authors show that the new Bayesian regression model is more robust against collinearity problems than both the finite mixture regression models and traditional Bayesian finite mixture models in terms of the RMSE and ARI. In addition, the new Bayesian regression model can also be used to simultaneously select the number of segments and select the variables to retain per segment.
We thank an anonymous reviewer for these comments.
References
Andrews, R. L., & Currim, I. S. (2003a). A comparison of segment retention criteria for finite mixture logit models. Journal of Marketing Research, 40(20), 235–243.
Andrews, R. L., & Currim, I. S. (2003b). Retention of latent segments in regression-based marketing models. International Journal of Research in Marketing, 20(4), 315–321.
Andrews, R. L., Ainsle, A., & Currim, I. S. (2002a). An empirical comparison of logit choice models with discrete versus continuous representations of heterogeneity. Journal of Marketing Research, 39(4), 479–487.
Andrews, R. L., Ansari, A., & Currim, I. S. (2002b). Hierarchical Bayes versus finite mixture conjoint analysis models: a comparison of fit, prediction and partworth recovery. Journal of Marketing Research, 39(1), 87.
Andrews, R. L., Currim, I. S., Leeflang, P., & Lim, J. (2007). Estimating the SCAN*PRO model of store sales: HB, FM or just OLS? International Journal of Research in Marketing, 25(1), 22–33.
Andrews, R. L., Brusco, M. J., Currim, I. S., & Davis, B. (2010). An empirical comparison of methods for clustering problems: are there benefits from having a statistical model? Review of Marketing Science, 8(1), 1–32.
Boone, D. S., & Roehm, M. (2002). Evaluating the appropriateness of market segmentation solutions using artificial neural networks and the membership clustering criterion. Marketing Letters, 13(4), 317–333.
Bozdogan, H. (1994). Mixture-model cluster analysis using model selection criteria in a new information measure of complexity. Paper presented at the Proceedings of the First US/Japan Conference on Frontiers of Statistical Modelling: An Information Approach.
Claeskens, G., & Hart, J. D. (2009). Goodness-of-fit tests in mixed models. Test, 18(2), 213–239.
Core Team, R. (2014). R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
Cortiñas, M., Chocarro, R., & Villanueva, M. L. (2010). Understanding multi-channel banking customers. Journal of Business Research, 63(11), 1215–1221.
DeSarbo, W. S., & Cron, W. L. (1988). A maximum likelihood methodology for clusterwise linear regression. Journal of Classification, 5(2), 249–282.
DeSarbo, W. S., Kamakura, W., & Wedel, M. (2004). Applications of multivariate latent variable models in marketing. In Y. Wind & P. E. Green (Eds.), Market Research and Modeling: Progress and Prospects. A Tribute to Paul E. Green (pp. 43–68). Boston: Kluwer Academic Publishers. et al.
DeSarbo, W. S., Benedetto, C. A., & Song, M. (2007). A heterogeneous resource based view for exploring relationships between firm performance and capabilities. Journal of Modelling in Management, 2(2), 103–130.
Dubois, B., Czellar, S., & Laurent, G. (2005). Consumer segmentation based on attitudes toward luxury: empirical evidence from twenty countries. Marketing Letters, 16(2), 115–128.
Grewal, R., Cote, J. A., & Baumgartner, H. (2004). Multicollinearity and measurement error in structural equation models: implications for theory testing. Marketing Science, 23(4), 519–529.
Grewal, R., Chakravarty, A., Ding, M., & Liechty, J. (2008). Counting chickens before the eggs hatch: associating new product development portfolios with shareholder expectations in the pharmaceutical sector. International Journal of Research in Marketing, 25(3), 261–272.
Grewal, R., Chandrashekaran, M., & Citrin, A. V. (2010). Customer satisfaction heterogeneity and shareholder value. Journal of Marketing Research, 47(4), 612–626.
Grewal, R., Chandrashekaran, M., Johnson, J. L., & Mallapragada, G. (2013). Environments, unobserved heterogeneity, and the effect of market orientation on outcomes for high-tech firms. Journal of the Academy of Marketing Science, 41(2), 206–233.
Grün, B., & Leisch, F. (2008). Flexmix 2: finite mixtures with concomitant variables and varying constant parameters. Journal of Statistical Software, 28(4), 1–35.
Hahn, C., Johnson, M. D., Herrmann, A., & Huber, F. (2002). Capturing customer heterogeneity using a finite mixture PLS approach. Schmalenbach Business Review, 54(3), 243–269.
Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2010). Multivariate data analysis (7th ed.). Englewood Cliffs: Prentice Hall.
Hawkins, D. S., Allen, D. M., & Stromberg, A. J. (2001). Determining the number of components in mixtures of linear models. Computational Statistics & Data Analysis, 38(1), 15–48.
Hennig, C. (2000). Identifiability of models for clusterwise linear regression. Journal of Classification, 17(2), 273–296.
Hubert, L., & Arabi, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.
Hutchinson, J. W., Kamakura, W. A., & Lynch, J. G. (2000). Unobserved heterogeneity as an alternative explanation for “reversal” effects in behavioral research. Journal of Consumer Research, 27(3), 324–344.
Jagpal, S., Jedidi, K., & Jamil, M. (2007). A multibrand concept-testing methodology for new product strategy. Journal of Product Innovation Management, 24(1), 34–51.
Jedidi, K., Jagpal, H. S., & DeSarbo, W. S. (1997). Finite-mixture structural equation models for response-based segmentation and unobserved heterogeneity. Marketing Science, 16(1), 39–59.
Kim, B.-D., Fong, D. K. H., & DeSarbo, W. S. (2012). Model-based segmentation featuring simultaneous segment-level variable selection. Journal of Marketing Research, 49(5), 725–736.
Kim, S., Blanchard, S. J., Desarbo, W. S., & Fong, D. K. H. (2013). Implementing managerial constraints in model-based segmentation: extensions of Kim, Fong, and DeSarbo (2012) with an application to heterogeneous perceptions of service quality. Journal of Marketing Research, 50(5), 664–673.
Kotler, P., & Keller, K. L. (2012). Marketing management (14th ed.). Pearson: Prentice-Hall.
Mantrala, M. K., Naik, P. A., Sridhar, S., & Thorson, E. (2007). Uphill or downhill? Locating the firm on a profit function. Journal of Marketing, 71(2), 26–44.
Marcoulides, G. A., Chin, W. W., & Saunders, C. (2012). When imprecise statistical statements become problematic: a response to Goodhue, Lewis, and Thompson. MIS Quarterly, 36(3), 717–728.
Mason, C. H., & Perreault, W. D. (1991). Collinearity, power, and interpretation of multiple regression analysis. Journal of Marketing Research, 28(3), 268–280.
McLachlan, G. J., & Peel, D. (2000). Finite mixture models. New York, NY: Wiley.
Ofir, C., & Khuri, A. (1986). Multicollinearity in marketing models: diagnostics and remedial measures. International Journal of Research in Marketing, 3(3), 181–205.
Sarstedt, M. (2008). Market segmentation with mixture regression models: understanding measures that guide model selection. Journal of Targeting, Measurement and Analysis for Marketing, 16(3), 228–246.
Sarstedt, M., & Ringle, C. M. (2010). Treating unobserved heterogeneity in PLS path modelling: a comparison of FIMIX-PLS with different data analysis strategies. Journal of Applied Statistics, 37(8), 1299–1318.
Wedel, M., & Kamakura, W. A. (2000). Market segmentation: conceptual and methodological foundations (2nd ed.). Boston: Kluwer.
Wedel, M., Kamakura, W., Arora, N., Bemmaor, A., Chiang, J., Elrod, T., et al. (1999). Discrete and continuous representations of unobserved heterogeneity in choice modeling. Marketing Letters, 10(3), 219–232.
Acknowledgments
The authors would like to thank Jörg Henseler (Radboud University Nijmegen) and Edward E. Rigdon (Georgia State University) for their comments on earlier versions of the paper.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
ESM 1
(DOCX 90.7 kb)
Rights and permissions
About this article
Cite this article
Becker, JM., Ringle, C.M., Sarstedt, M. et al. How collinearity affects mixture regression results. Mark Lett 26, 643–659 (2015). https://doi.org/10.1007/s11002-014-9299-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11002-014-9299-9