Abstract
In CUB models the uncertainty of choice is explicitly modelled as a Combination of discrete Uniform and shifted Binomial random variables. The basic concept to model the response as a mixture of a deliberate choice of a response category and an uncertainty component that is represented by a uniform distribution on the response categories is extended to a much wider class of models. The deliberate choice can in particular be determined by classical ordinal response models as the cumulative and adjacent categories model. Then one obtains the traditional and flexible models as special cases when the uncertainty component is irrelevant. It is shown that the effect of explanatory variables is underestimated if the uncertainty component is neglected in a cumulative type mixture model. Visualization tools for the effects of variables are proposed and the modelling strategies are evaluated by use of real data sets. It is demonstrated that the extended class of models frequently yields better fit than classical ordinal response models without an uncertainty component.
Similar content being viewed by others
References
Agresti A (2010) Analysis of ordinal categorical data, 2nd edn. Wiley, New York
Agresti A (2013) Categorical data analysis, 3d edn. Wiley, New York
Aitkin M (1999) A general maximum likelihood analysis of variance components in generalized linear models. Biometrics 55:117–128
Anderson JA (1984) Regression and ordered categorical variables. J Royal Stat Soc B 46:1–30
Böhning D, Dietz E, Schaub R, Schlattmann P, Lindsay BG (1994) The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann Inst Stat Math 46:373–388
Brant R (1990) Assessing proportionality in the proportional odds model for ordinal logistic regression. Biometrics 46:1171–1178
Breen R, Luijkx R (2010) Mixture models for ordinal data. Sociol Methods Res 39:3–24
Caffo B, An M-W, Rhode C (2007) Flexible random intercept models for binary outcomes using mixtures of normals. Comp Stat Data Anal 51:5220–5235
Cox C (1995) Location-scale cumulative odds models for ordinal data: A generalized non-linear model approach. Stat Med 14:1191–1203
D’Elia A, Piccolo D (2005) A mixture model for preference data analysis. Comp Stat Data Anal 49:917–934
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc B 39:1–38
Efron B, Tibshirani RJ (1994) An introduction to the bootstrap, vol 57. CRC Press, London
Everitt BS (1988) A finite mixture model for the clustering of mixed-mode data. Stat Prob Lett 6(5):305–309
Fahrmeir L, Tutz G (2001) Multivariate statistical modelling based on generalized linear models. Springer, New York
Follmann DA, Lambert D (1991) Identifiability of finite mixtures of logistic regression models. J Stat Plan Infer 27(3):375–381
Gambacorta R, Iannario M (2013) Measuring job satisfaction with CUB models. Labour 27(2):198–224
Gertheiss J, Tutz G (2009) Penalized Regression with Ordinal Predictors. Int Stat Rev 77:345–365
Gneiting T, Raftery A (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102:359–376
Greene W, Hensher D (2003) A latent class model for discrete choice analysis: contrasts with mixed logit. Trans Res Part B 39:681–689
Grilli L, Iannario M, Piccolo D, Rampichini C (2014) Latent class CUB models. Adv Data Anal Class 8(1):105–119
Grün B, Leisch F (2008) Identifiability of finite mixtures of multinomial logit models with varying and fixed effects. J Class 25:225–247
Iannario M (2010) On the identifiability of a mixture model for ordinal data. Metron 68(1):87–94
Iannario M (2012a) Hierarchical CUB models for ordinal variables. Commun Stat Theory Methods 41:3110–3125
Iannario M (2012b) Modelling shelter choices in a class of mixture models for ordinal responses. Stat Methods Appl 21:1–22
Iannario M (2012c) Preliminary estimators for a mixture model of ordinal data. Adv Data Anal Class 6:163–184
Iannario M, Piccolo D (2010) Statistical modelling of subjective survival probabilities. Genus 66:17–42
Iannario M, Piccolo D (2012) CUB models: Statistical methods and empirical evidence. In: Kennett SSR (ed) Modern analysis of customer surveys: with applications using R. Wiley, New York, pp 231–258
Leroux BG (1992) Consistent estimation of a mixing distribution. Ann Stat 20:1350–1360
Liu Q, Agresti A (2005) The analysis of ordinal categorical data: An overview and a survey of recent developments. Test 14:1–73
Manisera M, Zuccolotto P (2014) Modeling rating data with nonlinear CUB models. Comp Stat Data Anal 78:100–118
McCullagh P (1980) Regression model for ordinal data (with discussion). J Royal Stat Soc B 42:109–127
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Mehta CR, Patel NR, Tsiatis AA (1984) Exact significance testing to establish treatment equivalence with ordered categorical data. Biometrics 40:819–825
Nair VN (1987) Chi-squared-type tests for ordered alternatives in contingency tables. J Am Stat Assoc 82:283–291
Peterson B, Harrell FE (1990) Partial proportional odds models for ordinal response variables. Appl Stat 39:205–217
Piccolo D (2003) On the moments of a mixture of uniform and shifted binomial random variables. Quaderni di Stat 5:85–104
Piccolo D (2006) Observed information matrix in MUB models. Quaderni di Stat 8:33–78
Tutz G (2012) Regression for categorical data. Cambridge University Press, Cambridge
Tutz G, Gertheiss (2014) Rating scales as predictors—the old question of scale level and some answers. Psychometrika 79:357–376
Tutz G, Schauberger G (2013) Visualization of categorical response models - from data glyphs to parameter glyphs. J Comp Graph Stat 22(1):156–177
Wedel M, DeSarbo W (1995) A mixture likelihood approach for generalized linear models. J Class 12:21–55
Acknowledgments
This work has been partially supported by FIRB2012 project (Code RBFR12SHVV) at University of Perugia and the frame of Programme STAR (CUP E68C13000020003) at University of Naples Federico II, financially supported by UniNA and Compagnia di San Paolo. ISFOL survey data has been used under the agreement ISFOL/PLUS 2006/430.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Identifiability
We assume that the number of categories is greater than 2 (\(k>2\)) and that there is an effect of a continuous covariate x, that is \(\gamma \ne 0\). Let the CUP model with the cumulative logit model in the preference part be represented by two parameterizations, that is, for all x and r one has
There are values \(\Delta _{0r}, \Delta \) such that \(\tilde{\gamma }_{0r}=\gamma _{0r}+\Delta _{0r}\), \(\tilde{\gamma }=\gamma +\Delta \). With \(\eta _r(x)=\gamma _{0r}+x\gamma \) one obtains for all x and r
Let us consider now the specific values \(x_z= - \gamma _{0r}/\gamma + z/\gamma \) yielding for all values z and r
By building the difference between these equations for values z and \(z-1\) one obtains for all values z
The equation has to hold in particular for values \(z=1,2,\ldots \). Since the logistic distribution function \(F(\eta )= \exp (\eta )/(1+\exp (\eta ))\) is strictly monotonic and the derivative \(F'(\eta )= \exp (\eta )/(1+\exp (\eta )^2)\) is different for all values \(\eta \) it follows that \(\Delta _{0r}=\Delta =0\) and \(\pi =\tilde{\pi }\).
If the support of the covariate is finite one can consider different z- values. If \(x \in [l,u]\) (\(\gamma \) positive) one considers the transformed values \(z_i= \gamma l+\gamma _{0r}+\gamma (u-l)i/M\), for \(i=1,\dots ,M\), where M is any natural number. Then for all transformed values \(x_{z_i}= - \gamma _{0r}/\gamma + z_i/\gamma \) one has \(x_{z_i} \in [l,u]\). Thus, Eq. (6) has to hold for M different values \(z_i\). Since M can be any natural number the same argument as before yields \(\Delta _{0r}=\Delta =0\) and \(\pi =\tilde{\pi }\).
1.2 Estimation
The general CUP model is determined by the probability
where the first mixture component follows an ordinal model and the second represents the discrete uniform distribution.
For given data \((r_{i},\varvec{x}_{i})\), \(i=1,\dots ,n\), and collecting all parameters of the ordinal model used in the first mixture component in the parameter \(\varvec{\theta }\), the log-likelihood to be maximized is
The usual way to obtain estimates is to consider it as a problem with incomplete data and solve the maximization problem by using the EM algorithm. Therefore, let \(z_{i}\) denote the unknown mixture components with \(z_{i}=1\) indicating that observation i is from the first mixture component, \(z_{i}=0\) indicates that it is from the second mixture component. Then the complete density for \((r_i,z_i)\) is
yielding the complete log-likelihood
The EM algorithm treats \(z_{i}\) as missing data and maximizes the log-likelihood iteratively by using an expectation and a maximization step. During the E-step the conditional expectation of the complete log-likelihood given the observed data \(\varvec{r}\) and the current estimate \(\varvec{\theta }^{(s)}\),
has to be computed. Because \(l_c(\varvec{\theta })\) is linear in the unobservable data \(z_{i}\), it is only necessary to estimate the current conditional expectation of \(z_{i}\). From Bayes’s theorem follows
This is the posterior probability that the observation \(r_i\) belongs to the first component of the mixture. For the s-th iteration one obtains
Thus, for given \(\varvec{\theta }^{(s)}\) one computes in the E-step the weights \(\hat{z}_{i}^{(s)}\) and in the M-step maximizes \(M(\varvec{\theta }|\varvec{\theta }^{(s)})\) (or rather \(M_{1}\) and \(M_{2}\)). If the mixture probabilities do not depend on covariates, that is, \(\pi _i=\pi \), one obtains
The E- and M-steps are repeated alternatingly until the difference \(L(\varvec{\theta }^{(s+1)})-L(\varvec{\theta }^{(s)})\) is small enough to assume convergence. Computation of \(\varvec{\theta }^{(s+1)}\) can be based on familiar maximization tools, because one maximizes a weighted log-likelihood of an ordinal model with known weights. In the case where only intercepts are component-specific, the derivatives are very similar to the score function used in a Gauss-Hermite quadrature and a similar EM algorithm applies with an additional calculation of the mixing distribution (see Aitkin 1999).
Dempster et al. (1977) showed that under weak conditions the EM algorithm finds a local maximum of the likelihood function \(L(\varvec{\theta })\). Hence it is sensible to use different start values \(\varvec{\theta }^{(0)}\) to find the solution of the maximization problem.
If covariates determine the probability that observation i belongs to the first mixture component in the form of a logit model, \(\pi _{i}({\varvec{\beta }})=1/(1+\exp ({-\varvec{z}_{i}^{T}{\varvec{\beta }}}))\), \(M_{1}\) is the weighted log-likelihood of a binary logit model. Then \(M_{1}\) and \(M_{2}\) are maximized separately to obtain the next iteration. The simple update \(\pi ^{(s+1)}= \sum _{i=1}^{n}\hat{z}_{i}^{(s)}/n\) is replaced by
As default value for the stopping of the iterations we used the difference in two consecutive likelihoods; if it was below \(10^{-6}\) the algorithm was stopped.
Rights and permissions
About this article
Cite this article
Tutz, G., Schneider, M., Iannario, M. et al. Mixture models for ordinal responses to account for uncertainty of choice. Adv Data Anal Classif 11, 281–305 (2017). https://doi.org/10.1007/s11634-016-0247-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-016-0247-9