Journal of Classification

, Volume 12, Issue 1, pp 21–55 | Cite as

A mixture likelihood approach for generalized linear models

  • Michel WedelEmail author
  • Wayne S. DeSarbo


A mixture model approach is developed that simultaneously estimates the posterior membership probabilities of observations to a number of unobservable groups or latent classes, and the parameters of a generalized linear model which relates the observations, distributed according to some member of the exponential family, to a set of specified covariates within each Class. We demonstrate how this approach handles many of the existing latent class regression procedures as special cases, as well as a host of other parametric specifications in the exponential family heretofore not mentioned in the latent class literature. As such we generalize the McCullagh and Nelder approach to a latent class framework. The parameters are estimated using maximum likelihood, and an EM algorithm for estimation is provided. A Monte Carlo study of the performance of the algorithm for several distributions is provided, and the model is illustrated in two empirical applications.

Key Words

Mixture models Generalized linear models EM algorithm Maximum likelihood estimation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. AITKIN, M., and RUBIN, D. B. (1985), “Estimation and Hypothesis Testing in Finite Mixture Distributions,”Journal of the Royal Statistical Society, Series B, 47, 67–75.Google Scholar
  2. AITKIN, M., ANDERSON, D., and HINDE, J. (1981), “Statistical Modelling of Data on Teaching Styles (with discussion),”Journal of the Royal Statistical Society, A 144, 419–461.Google Scholar
  3. AKAIKE, H. (1974), “A New Look at Statistical Model Identification,”IEEE Transactions on Automatic Control, AC-19, 716–723.Google Scholar
  4. ATKINSON, K. E. (1989),An Introduction to Numerical Analysis, New York: Wiley.Google Scholar
  5. BANFIELD, C. F., and BASSIL, L. C. (1977), “A Transfer Algorithm for Non-Hierarchical Classification,”Applied Statistics, 26, 206–210.Google Scholar
  6. BASFORD, K. E., and MCLACHLAN, G. J. (1985), “The Mixture Method of Clustering Applied to Three-Way Data,”Journal of Classification, 2, 109–125.Google Scholar
  7. BAWA, K., and SHOEMAKER, R. W. (1987), “The Coupon-Prone Consumer: Some Findings Based on Purchase Behavior Across Product Classes,”Journal of Marketing 51, 99–110.Google Scholar
  8. BAWA, K., and SHOEMAKER, R. W. (1989), “Analyzing Incremental Sales from a Direct Mail Coupon Promotion,”Journal of Marketing, 53, 66–78.Google Scholar
  9. BERK, R. H. (1972), “Consistency and Asymptotic Normality of MLS's for Exponential Models,”Annals of Mathematical Statistics, 43, 193–204.Google Scholar
  10. BHATTACHARYA, C. G. (1967), “A Simple Method for Resolution of a Distribution into its Gaussian Components,”Biometrics, 23, 115–135.Google Scholar
  11. BLATTBERG, R. C., BUESING, T., PEACOCK, P. and SEN, S. K. (1978) “Identifying the Deal Prone Segment,”Journal of Marketing Research, 15, 369–377.Google Scholar
  12. BOYLES, R. A. (1983), “On Convergence of the EM Algorithm,”Journal of the Royal Statistical Society, Series B, 45, 47–50.Google Scholar
  13. BOZDOGAN, H. (1987), “Model Selection and Akaike's Information Criterion (AIC): The General Theory and its Analytical Extensions,”Psychometrika, 52, 345–370.Google Scholar
  14. BOZDOGAN, H., and SCLOVE, S. L. (1984), “Multi-sample Cluster Analysis Using Akaike's Information Criterion,”Annals of the Institute of Statistics and Mathematics, 36, 163–180.Google Scholar
  15. CASSIE, R. M. (1954), “Some Uses of Probability Paper for the Graphical Analysis of Polymodel Frequency Distributions,”Australian Journal of Marine and Freshwater Research, 5, 513–522.Google Scholar
  16. CHARLIER, C. V. L., and WICKSELL, S. D. (1924), “On the Dissection of Frequency Functions,”Arkiv für Mathematik, Astronomi och Fysik, BD 18, 6.Google Scholar
  17. COHEN, J. (1988),Statistical Power Analysis for the Behavioral Sciences, Hillsdale: Lawrence Erlbaum.Google Scholar
  18. COTTON, B. C., and BABB, E. M. (1978), “Consumer Response to Promotional Deals,”Journal of Marketing, 42, 109–113.Google Scholar
  19. SRAMÉR, H. (1946),Mathematical Methods of Statistics, Princeton: Princeton University Press.Google Scholar
  20. DAVIES, R. B. (1977), “Hypothesis Testing When a Nuissance Parameter is Present Only Under the Alternative,”Biometrika, 64, 247–254.Google Scholar
  21. DAY, N. E. (1969), “Estimating the Components of a Mixture of two Normal Distributions,”Biometrika, 56, 463–474.Google Scholar
  22. DEMPSTER, A. P. (1971), “An Overview of Multivariate Data Analysis,”Journal of Multivariate Analysis, 1, 316–346.Google Scholar
  23. DEMPSTER A. P., LAIRD, N. M., and RUBIN, R. B. (1977), “Maximum Likelihood from Incomplete Data via the EM-Algorithm,”Journal of the Royal Statistical Society, Series B, 39, 1–38.Google Scholar
  24. DESARBO, W. S., OLIVER, R. L., and DE SOETE, G. (1986), “A Probabilistic Multi-Dimensional Scaling Vector Model,”Applied Psychological Measurement, 10, 79–98.Google Scholar
  25. DESARBO, W. S., and CRON W. L. (1988), “A Maximum Likelihood Methodology for Clusterwise Linear Regression”Journal of Classification, 5, 249–282.Google Scholar
  26. DESARBO, W. S., OLIVER, R. L., and RANGASWAMY, A. (1989), “A Simulated Annealing Methodology for Clusterwise Regression,”Psychometrika, 54, 707–736.Google Scholar
  27. DESARBO, W. S., WEDEL M., VRIENS, M. and RAMASWAMY, V. (1992) “Latent Class Metric Conjoint Analysis,”Marketing Letters, 3, 273–288.Google Scholar
  28. DESARBO, W. S., RAMASWAMY V., REIBSTEIN, D. J., and ROBINSON W. T. (1993), “A Latent Pooling Methodology for Regression Analysis with Limited Time Series of Cross Sections: a PIMS Data Application,”Marketing Science, 12, 103–124.Google Scholar
  29. DE SOETE G., and DESARBO W. S. (1991), “A Latent Class Probit Model for Analyzing Pick Any/N Data,”Journal of Classification, 8, 45–63.Google Scholar
  30. DODSON, J. A., TYBOUT, A. M., and STERNTHAL, B. (1978), “Impact of Deals and Deal Retraction on Brand Switching,”Journal of Marketing Research, 15, 72–81.Google Scholar
  31. EVERITT, B. S. (1984), “Maximum Likelihood Estimation of the Parameters in a Mixture of two Univariate Normal Distributions: A Comparison of Different Algorithms,”Statistican, 33, 205–215.Google Scholar
  32. EVERITT, B. S., and HAND D. J. (1981),Finite Mixture Distributions, London: Chapman and Hall.Google Scholar
  33. FISHER, R. A. (1935), “The Case of Zero Survivors,” (Appendix to Bliss, C.I. (1935)),Annals of Applied Biology, 22, 164–165.Google Scholar
  34. FOWLKES, E. B. (1979), “Some Methods for Studying Mixtures of two Normal (Lognormal) Distributions,”Journal of the American Statistical Association, 74, 561–575.Google Scholar
  35. FRYER I. G., and ROBERTSON C. A. (1972), “A Comparison of Some Methods for Estimating Mixed Normal Distributions,”Biometrika, 59, 639–648.Google Scholar
  36. GHOSH, J. M., and SEN, P. K. (1985), “On the Asymptotic Performance of the Loglikelihood Ratio Statistic for the Mixture Model and Related Results,”Proceedings of the Berkeley Conference, Neyman, and Kiefer, II, Monterey: Wadsworth, 789–806.Google Scholar
  37. GOODMAN, L. A. (1974), “Ex0loratory Latent Structure Analysis Using Both Identifiable and Unidentifiable Models,”Biometrika, 61, 215–231.Google Scholar
  38. GREEN, P. E., and RAO V. R. (1971), “Conjoint Measurement for Quantifying Judgmental Data,”Journal of Marketing Research, 8, 355–363.Google Scholar
  39. GREEN P. J. (1984), “Iteratively Reweighted Least Squares for Maximum Likelihood Estimation, and Some Robust and Resistant Alternatives,”Journal of the Royal Statistical Society, Series B 46, 149–192.Google Scholar
  40. HABERMAN, S. J. (1977), “Maximum Likelihood Estimates in Exponential Response Models,”Annals of Statistics, 5, 815–841.Google Scholar
  41. HARDING, I. P. (1948), “The Use of Probability Paper for the Graphical Analysis of Polymodel Frequency Distributions,”Journal of the Marine Biological, Association (UK), 28, 141–153.Google Scholar
  42. HASSELBLAD, V. (1966), “Estimation of Parameters for a Mixture of Normal Distributions,”Technometrics, 8, 431–444.Google Scholar
  43. HASSELBLAD, V. (1969) “Estimation of Finite Mixtures of Distributions from the Exponential Family,”Journal of the American Statistical Association, 64, 1459–1471.Google Scholar
  44. HOPE, A. C. A. (1968), “A Simplified Monte Carlo Significance Test Procedure,”Journal of the Royal Statistical Society, Series B, 30, 582–598.Google Scholar
  45. HOSMER, D. W. (1974), “Maximum Likelihood Estimates of the Parameters of a Mixture of two Regression Lines,”Communications in Statistics, 3, 995–1006.Google Scholar
  46. JONES, P. N., and MCLACHLAN, G. J. (1992), “Fitting Finite Mixture Models in a Regression Context,”Australian Journal of Statistics, 43, 233–240.Google Scholar
  47. JONES, P. N., and MCLACHLAN, G. J. (1992), “Improving the Convergence Rate of the EM Algorithm for a Mixture Model Fitted to Grouped and Truncated Data,”Journal of Statistical Computation and Simulation, 43, 31–44.Google Scholar
  48. JORGENSEN, B. (1984), “The Delta Algorithm and GLIM,”International Statistical Review, 52, 283–300.Google Scholar
  49. KAMAKURA, W. A., and RUSSELL G. J. (1989), “A Probabilistic Choice Model for Market Segmentation and Elasticity Structure,”Journal of Marketing Research, 26, 379–390.Google Scholar
  50. KAMAKURA, W. A. (1991), “Estimating Flexible Distributions of Ideal-points with External Analysis of Preference,”Psychometrika, 56, 419–448.Google Scholar
  51. LANGEHEINE, R., and ROST, J. (1988),Latent Trait and Latent Class Models, New York: Plenum.Google Scholar
  52. LAZARSFELD P. F., and HENRY N. W. (1968),Latent Structure Analysis, Boston: Houghton-Mifflin.Google Scholar
  53. LI, L. A., and SEDRANSK N. (1988), “Mixtures of Distributions: A Topological Approach,”Annals of Statistics, 16, 1623–1634.Google Scholar
  54. LICHTENSTEIN, D. R., NETEMEYER, R. G., and BURTON, S. (1990), “Distinguishing Coupon Proneness from Value Consciousness: An Acquisition-Transaction Utility Theory Perspective,”Journal of Marketing, 54, 54–67.Google Scholar
  55. LOUIS, T. A. (1982), “Finding the Observed Information Matrix When Using the EM Algorithm,”Journal of the Royal Statistical Society, Series B, 44, 226–233.Google Scholar
  56. LWIN T., and MARTIN P. J. (1989), “Probits of Mixtures,”Biometrics, 45, 721–732.Google Scholar
  57. MCCULLAGH, P., and NELDER, J. A. (1989),Generalized Linear Models, New York: Chapman and Hall.Google Scholar
  58. MCHUGH, R. B. (1956), “Efficient Estimation and Local, Identification in Latent Class Analysis,”Psychometrika, 21, 331–347.Google Scholar
  59. MCHUGH, R. B. (1958) “Note on Efficient Estimation and Local, Identification in Latent Class Analysis,”Psychometrika, 23, 273–274.Google Scholar
  60. MCLACHLAN, G. J. (1982), “The Classification and Mixture Maximum Likelihood Approaches to Cluster Analysis,” inHandbook of Statistics (vol 2) Eds., P. R. Krishnaiah and L. N. Kanal, Amsterdam: North-Holland, 199–208.Google Scholar
  61. MCLACHLAN, G. J. (1987), “On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture,”Applied Statistics, 36, 318–324.Google Scholar
  62. MCLACHLAN, G. J., and BASFORD, K. E. (1988),Mixture Models: Inference and Application to Clustering, New York: Marcel Dekker.Google Scholar
  63. MEILIJSON, J. (1989), “A Fast Improvement of the RM Algorithm on Its Own Terms,”Journal of the Royal Statistical Society, B 51, 127–138.Google Scholar
  64. MOOIJAART, A., and VAN DER HEIJDEN, P. G. M. (1992), “The EM Algorithm for Latent Class Analysis with Constraints,”Psychometrika, 57, 261–271.Google Scholar
  65. NARASIMHAN, C. (1984), “A Price Discrimination Theory of Coupons,”Marketing Science, 3, 125–145.Google Scholar
  66. NELDER, J. A., and WEDDERBURN, R. W. M. (1972), “Generalized Linear Models,”Journal of the Royal Statistical Society, Series A, 135, 370–384.Google Scholar
  67. NEWCOMB, S. (1886), “A Generalized Theory of the Combination of Observations So As To Obtain the Best Result,”American Journal of Mathematics, 8, 343–366.Google Scholar
  68. NIELSEN, A. C. (1965), “The Impact of Retail Coupons,”Journal of Marketing (October), 11–15.Google Scholar
  69. OLIVER, R. L. (1980), “A Cognitive Model of the Antecedents and Consequences of Satisfaction Decisions,”Journal of Marketing Research, 17, 460–469.Google Scholar
  70. OLIVER, R. L., and DESARBO, W. S. (1988), “Response Determinants in Satisfaction Judgments,”Journal of Consumer Research, 14, 495–507.Google Scholar
  71. PEARSON, K. (1894), “Contributions to the Mathematical Theory of Evolution,”Philosophical Transactions, A, 185, 71–110.Google Scholar
  72. PETERS, B. C., and WALKER H. F. (1978), “An Iterative Procedure for Obtaining Maximum Likelihood Estimates of the Parameters of a Mixture of Normal Distributions,”Journal of Applied Mathematics, 35, 362–378.Google Scholar
  73. QUANDT, R. E., and RAMSEY, J. B. (1978), “Estimating Mixtures of Normal Distributions and Switching Regressions,”Journal of the American Statistical Association, 73, 730–738.Google Scholar
  74. QUANDT, R. E. (1972), “A New Approach to Estimating Switching Regressions,”Journal of the American Statistical Association, 67, 306–310.Google Scholar
  75. REDNER, R. A., and WALKER, H. F. (1984), “Mixture Densities, Maximum Likelihood and the EM Algorithm,”SIAM Review, 26, 195–239.Google Scholar
  76. RUBINSTEIN, R. Y. (1981),Simulation and the Monte Carlo Method, New York: Wiley.Google Scholar
  77. SCHWARTZ, G. (1978), “Estimating the Dimensions of a Model,”Annals of Statistics, 6, 461–464.Google Scholar
  78. SCLOVE, S. L. (1987), “Applications of Model-Selection Criteria to some Problems in Multivariate Analysis,”Psychometrika, 52, 333–343.Google Scholar
  79. SHIMP, T. A., and KAVAS, A. (1984), “The Theory of Reasoned Action Applied to Coupon Usage,”Journal of Consumer Research, 11, 795–809.Google Scholar
  80. STIGLER, S. M. (1986),The History of Statistics, Cambridge, Mass: Harvard University Press.Google Scholar
  81. SYMONS, M. J. (1981), “Clustering Criteria and Multivariate Normal Mixtures,”Biometrics, 37, 35–43.Google Scholar
  82. TEEL, J. E., WILLIAMS R. H., and BEARDEN W. O. (1980), “Correlates of Consumer Susceptibility to Coupons in New Grocery Product Introductions”Journal of Advertising, 3, 31–35.Google Scholar
  83. TEICHER, H. (1961), “Identifiability of Mixtures,”Annals of Mathematical Statistics, 31, 55–73.Google Scholar
  84. TITTERINGTON, D. M. (1990), “Some Recent Research in the Analysis of Mixture Distributions,”Statistics, 4, 619–641.Google Scholar
  85. TITTERINGTON, D. M., SMITH, A. F. M., and MAKOV, U. E. (1985),Statistical Analysis of Finite Mixture Distributions, New York: Wiley.Google Scholar
  86. THOMAS, E. A. C. (1966), “Mathematical Models for the Clustered Firing of Single Cortical Neurons,”British Journal of Mathematical and Statistical Psychology, 19, 151–162.Google Scholar
  87. VILCASSIM, N. J., and WITTINK, D. R. (1987), “Supporting a Higher Shelf Price Through Coupon Distributions,”Journal of Consumer Marketing, 4, 29–39.Google Scholar
  88. WARD, R. W., and DAVIS, J. E. (1978), “A Pooled Cross-Section Time Series Model of Coupon Promotions,”American Journal of Agricultural Economics, (August), 193–401.Google Scholar
  89. WEDEL, M., and DESARBO, W. S. (1994), “A Review of Recent Developments in Latent Class Regression Models,” inAdvanced Methods of Marketing Research, Ed., R. Bagozzi, 352–388.Google Scholar
  90. WEDEL, M., DESARBO, W. S., BULT J. R., and RAMASWAMY, V. (1993), “A Latent Class Poisson Regression Model for Heterogeneous Count Data With an Application to Direct Mail,”Journal of Applied Econometrics, 8, 397–411.Google Scholar
  91. WEDEL, M., and DESARBO, W. S. (1993), “A Latent Class Binomial Logit Methodology for the Analysis of Paired Comparison Data: An Application Reinvestigating the Determinants of Perceived Risk,”Decision Sciences, 24, 1157–1170.Google Scholar
  92. WOLFE, J. H. (1970), “Pattern Clustering by Multivariate Mixture Analysis,”Multivariate Behavioral Research, 5, 329–350.Google Scholar
  93. WU, C. F. J. (1983), “On the Convergence Properties of the EM Algorithm,”Annals of Statistics, 11, 95–103.Google Scholar

Copyright information

© Springer-Verlag 1995

Authors and Affiliations

  1. 1.Department of Business Administration and Management Science, Faculty of EconomicsUniversity of GroningenGroningenThe Netherlands
  2. 2.University of MichiganUSA

Personalised recommendations