Skip to main content
Log in

Mixture models for ordinal responses to account for uncertainty of choice

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

In CUB models the uncertainty of choice is explicitly modelled as a Combination of discrete Uniform and shifted Binomial random variables. The basic concept to model the response as a mixture of a deliberate choice of a response category and an uncertainty component that is represented by a uniform distribution on the response categories is extended to a much wider class of models. The deliberate choice can in particular be determined by classical ordinal response models as the cumulative and adjacent categories model. Then one obtains the traditional and flexible models as special cases when the uncertainty component is irrelevant. It is shown that the effect of explanatory variables is underestimated if the uncertainty component is neglected in a cumulative type mixture model. Visualization tools for the effects of variables are proposed and the modelling strategies are evaluated by use of real data sets. It is demonstrated that the extended class of models frequently yields better fit than classical ordinal response models without an uncertainty component.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Agresti A (2010) Analysis of ordinal categorical data, 2nd edn. Wiley, New York

    Book  MATH  Google Scholar 

  • Agresti A (2013) Categorical data analysis, 3d edn. Wiley, New York

    MATH  Google Scholar 

  • Aitkin M (1999) A general maximum likelihood analysis of variance components in generalized linear models. Biometrics 55:117–128

    Article  MathSciNet  MATH  Google Scholar 

  • Anderson JA (1984) Regression and ordered categorical variables. J Royal Stat Soc B 46:1–30

    MathSciNet  MATH  Google Scholar 

  • Böhning D, Dietz E, Schaub R, Schlattmann P, Lindsay BG (1994) The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann Inst Stat Math 46:373–388

    Article  MATH  Google Scholar 

  • Brant R (1990) Assessing proportionality in the proportional odds model for ordinal logistic regression. Biometrics 46:1171–1178

    Article  Google Scholar 

  • Breen R, Luijkx R (2010) Mixture models for ordinal data. Sociol Methods Res 39:3–24

    Article  MathSciNet  Google Scholar 

  • Caffo B, An M-W, Rhode C (2007) Flexible random intercept models for binary outcomes using mixtures of normals. Comp Stat Data Anal 51:5220–5235

    Article  MathSciNet  MATH  Google Scholar 

  • Cox C (1995) Location-scale cumulative odds models for ordinal data: A generalized non-linear model approach. Stat Med 14:1191–1203

    Article  Google Scholar 

  • D’Elia A, Piccolo D (2005) A mixture model for preference data analysis. Comp Stat Data Anal 49:917–934

    Article  MathSciNet  MATH  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc B 39:1–38

    MathSciNet  MATH  Google Scholar 

  • Efron B, Tibshirani RJ (1994) An introduction to the bootstrap, vol 57. CRC Press, London

    MATH  Google Scholar 

  • Everitt BS (1988) A finite mixture model for the clustering of mixed-mode data. Stat Prob Lett 6(5):305–309

    Article  MathSciNet  Google Scholar 

  • Fahrmeir L, Tutz G (2001) Multivariate statistical modelling based on generalized linear models. Springer, New York

    Book  MATH  Google Scholar 

  • Follmann DA, Lambert D (1991) Identifiability of finite mixtures of logistic regression models. J Stat Plan Infer 27(3):375–381

    Article  MathSciNet  MATH  Google Scholar 

  • Gambacorta R, Iannario M (2013) Measuring job satisfaction with CUB models. Labour 27(2):198–224

    Article  Google Scholar 

  • Gertheiss J, Tutz G (2009) Penalized Regression with Ordinal Predictors. Int Stat Rev 77:345–365

    Article  Google Scholar 

  • Gneiting T, Raftery A (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102:359–376

    Article  MathSciNet  MATH  Google Scholar 

  • Greene W, Hensher D (2003) A latent class model for discrete choice analysis: contrasts with mixed logit. Trans Res Part B 39:681–689

    Article  Google Scholar 

  • Grilli L, Iannario M, Piccolo D, Rampichini C (2014) Latent class CUB models. Adv Data Anal Class 8(1):105–119

    Article  MathSciNet  Google Scholar 

  • Grün B, Leisch F (2008) Identifiability of finite mixtures of multinomial logit models with varying and fixed effects. J Class 25:225–247

    Article  MathSciNet  MATH  Google Scholar 

  • Iannario M (2010) On the identifiability of a mixture model for ordinal data. Metron 68(1):87–94

    Article  MathSciNet  MATH  Google Scholar 

  • Iannario M (2012a) Hierarchical CUB models for ordinal variables. Commun Stat Theory Methods 41:3110–3125

    Article  MathSciNet  MATH  Google Scholar 

  • Iannario M (2012b) Modelling shelter choices in a class of mixture models for ordinal responses. Stat Methods Appl 21:1–22

    Article  MathSciNet  MATH  Google Scholar 

  • Iannario M (2012c) Preliminary estimators for a mixture model of ordinal data. Adv Data Anal Class 6:163–184

    Article  MathSciNet  MATH  Google Scholar 

  • Iannario M, Piccolo D (2010) Statistical modelling of subjective survival probabilities. Genus 66:17–42

    Google Scholar 

  • Iannario M, Piccolo D (2012) CUB models: Statistical methods and empirical evidence. In: Kennett SSR (ed) Modern analysis of customer surveys: with applications using R. Wiley, New York, pp 231–258

  • Leroux BG (1992) Consistent estimation of a mixing distribution. Ann Stat 20:1350–1360

    Article  MathSciNet  MATH  Google Scholar 

  • Liu Q, Agresti A (2005) The analysis of ordinal categorical data: An overview and a survey of recent developments. Test 14:1–73

    Article  MathSciNet  MATH  Google Scholar 

  • Manisera M, Zuccolotto P (2014) Modeling rating data with nonlinear CUB models. Comp Stat Data Anal 78:100–118

    Article  MathSciNet  Google Scholar 

  • McCullagh P (1980) Regression model for ordinal data (with discussion). J Royal Stat Soc B 42:109–127

    MathSciNet  MATH  Google Scholar 

  • McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York

    Book  MATH  Google Scholar 

  • Mehta CR, Patel NR, Tsiatis AA (1984) Exact significance testing to establish treatment equivalence with ordered categorical data. Biometrics 40:819–825

    Article  MathSciNet  MATH  Google Scholar 

  • Nair VN (1987) Chi-squared-type tests for ordered alternatives in contingency tables. J Am Stat Assoc 82:283–291

    Article  MathSciNet  MATH  Google Scholar 

  • Peterson B, Harrell FE (1990) Partial proportional odds models for ordinal response variables. Appl Stat 39:205–217

    Article  MATH  Google Scholar 

  • Piccolo D (2003) On the moments of a mixture of uniform and shifted binomial random variables. Quaderni di Stat 5:85–104

    Google Scholar 

  • Piccolo D (2006) Observed information matrix in MUB models. Quaderni di Stat 8:33–78

    Google Scholar 

  • Tutz G (2012) Regression for categorical data. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Tutz G, Gertheiss (2014) Rating scales as predictors—the old question of scale level and some answers. Psychometrika 79:357–376

  • Tutz G, Schauberger G (2013) Visualization of categorical response models - from data glyphs to parameter glyphs. J Comp Graph Stat 22(1):156–177

    Article  MathSciNet  Google Scholar 

  • Wedel M, DeSarbo W (1995) A mixture likelihood approach for generalized linear models. J Class 12:21–55

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work has been partially supported by FIRB2012 project (Code RBFR12SHVV) at University of Perugia and the frame of Programme STAR (CUP E68C13000020003) at University of Naples Federico II, financially supported by UniNA and Compagnia di San Paolo. ISFOL survey data has been used under the agreement ISFOL/PLUS 2006/430.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gerhard Tutz.

Appendix

Appendix

1.1 Identifiability

We assume that the number of categories is greater than 2 (\(k>2\)) and that there is an effect of a continuous covariate x, that is \(\gamma \ne 0\). Let the CUP model with the cumulative logit model in the preference part be represented by two parameterizations, that is, for all x and r one has

$$\begin{aligned} \pi F(\gamma _{0r}+x\gamma ) + (1-\pi )r/k = \tilde{\pi }F(\tilde{\gamma }_{0r}+x\tilde{\gamma }) + (1-\tilde{\pi })r/k. \end{aligned}$$

There are values \(\Delta _{0r}, \Delta \) such that \(\tilde{\gamma }_{0r}=\gamma _{0r}+\Delta _{0r}\), \(\tilde{\gamma }=\gamma +\Delta \). With \(\eta _r(x)=\gamma _{0r}+x\gamma \) one obtains for all x and r

$$\begin{aligned} \pi F(\eta _r(x)) - \tilde{\pi }F(\eta _r(x)+\Delta _{0r}+ x\Delta ) = (\pi -\tilde{\pi })r/k. \end{aligned}$$

Let us consider now the specific values \(x_z= - \gamma _{0r}/\gamma + z/\gamma \) yielding for all values z and r

$$\begin{aligned} \pi F(z) - \tilde{\pi }F(z+\Delta _{0r}+ x_z\Delta ) = (\pi -\tilde{\pi })r/k. \end{aligned}$$

By building the difference between these equations for values z and \(z-1\) one obtains for all values z

$$\begin{aligned} \pi (F(z) - F(z-1)) = \tilde{\pi }(F(z+\Delta _{0r}+ x_z\Delta ) - F(z-1+\Delta _{0r}+ x_z\Delta )). \end{aligned}$$
(6)

The equation has to hold in particular for values \(z=1,2,\ldots \). Since the logistic distribution function \(F(\eta )= \exp (\eta )/(1+\exp (\eta ))\) is strictly monotonic and the derivative \(F'(\eta )= \exp (\eta )/(1+\exp (\eta )^2)\) is different for all values \(\eta \) it follows that \(\Delta _{0r}=\Delta =0\) and \(\pi =\tilde{\pi }\).

If the support of the covariate is finite one can consider different z- values. If \(x \in [l,u]\) (\(\gamma \) positive) one considers the transformed values \(z_i= \gamma l+\gamma _{0r}+\gamma (u-l)i/M\), for \(i=1,\dots ,M\), where M is any natural number. Then for all transformed values \(x_{z_i}= - \gamma _{0r}/\gamma + z_i/\gamma \) one has \(x_{z_i} \in [l,u]\). Thus, Eq. (6) has to hold for M different values \(z_i\). Since M can be any natural number the same argument as before yields \(\Delta _{0r}=\Delta =0\) and \(\pi =\tilde{\pi }\).

1.2 Estimation

The general CUP model is determined by the probability

$$\begin{aligned} P(r_{i}|\varvec{x}_i)= \pi _i P_M(r_{i}|\varvec{x}_i) + (1- \pi _i)P_U(r_{i}), \end{aligned}$$

where the first mixture component follows an ordinal model and the second represents the discrete uniform distribution.

For given data \((r_{i},\varvec{x}_{i})\), \(i=1,\dots ,n\), and collecting all parameters of the ordinal model used in the first mixture component in the parameter \(\varvec{\theta }\), the log-likelihood to be maximized is

$$\begin{aligned} l(\varvec{\theta })=\sum _{i=1}^n \log (\pi _i P_M(r_i|\varvec{x}_i) + (1- \pi _i)P_U(r_i)). \end{aligned}$$

The usual way to obtain estimates is to consider it as a problem with incomplete data and solve the maximization problem by using the EM algorithm. Therefore, let \(z_{i}\) denote the unknown mixture components with \(z_{i}=1\) indicating that observation i is from the first mixture component, \(z_{i}=0\) indicates that it is from the second mixture component. Then the complete density for \((r_i,z_i)\) is

$$\begin{aligned} P(r_i, z_i \vert \varvec{x}_{i}, \varvec{\theta })= P(r_i \vert z_{i}, \varvec{x}_{i}, \varvec{\theta })P( z_{i} )=P_M(r_i|\varvec{x}_i)^{z_i}P_U(r_i)^{z_i-1}\pi _i^{z_i}(1-\pi _i)^{z_i-1} \end{aligned}$$

yielding the complete log-likelihood

$$\begin{aligned} l_c(\varvec{\theta })&= \sum _{i=1}^{n} \log (P(r_i, z_i \vert \varvec{x}_{i}, \varvec{\theta }))\\&= \sum _{i=1}^{n} z_i (\log (P_M(r_i|\varvec{x}_i))+ \log (\pi _i))+ (1-z_i)(\log (P_U(r_i))+ \log (1-\pi _i)). \end{aligned}$$

The EM algorithm treats \(z_{i}\) as missing data and maximizes the log-likelihood iteratively by using an expectation and a maximization step. During the E-step the conditional expectation of the complete log-likelihood given the observed data \(\varvec{r}\) and the current estimate \(\varvec{\theta }^{(s)}\),

$$\begin{aligned} M(\varvec{\theta }|\varvec{\theta }^{(s)})={\text {E}}\left( l_c(\varvec{\theta })|\varvec{r}, \varvec{\theta }^{(s)}\right) \end{aligned}$$

has to be computed. Because \(l_c(\varvec{\theta })\) is linear in the unobservable data \(z_{i}\), it is only necessary to estimate the current conditional expectation of \(z_{i}\). From Bayes’s theorem follows

$$\begin{aligned} E(z_{i}|\varvec{y},\varvec{\theta })&= P\left( z_{i}=1| r_i,\varvec{x}_{i},\varvec{\theta }\right) \\&=P\left( r_i|z_{i}=1,\varvec{x}_{i},\varvec{\theta }\right) P(z_{i}=1|\varvec{x}_{i},\varvec{\theta })/P(r_i|\varvec{x}_{i},\varvec{\theta }) \\&=\pi _i P_M(r_i|\varvec{x}_{i},\varvec{\theta })/P( r_i|\varvec{x}_{i},\varvec{\theta })= \hat{z}_{i}. \end{aligned}$$

This is the posterior probability that the observation \(r_i\) belongs to the first component of the mixture. For the s-th iteration one obtains

$$\begin{aligned} M(\varvec{\theta }|\varvec{\theta }^{(s)})&= \sum _{i=1}^{n} \hat{z}_{i}^{(s)}\left( \log (\pi _{i})+ \log (P_M(r_i \vert \varvec{x}_{i},\varvec{\theta })\right) \\&\quad +(1-\hat{z}_{i}^{(s)})\left( \log (1-\pi _{i})+ \log (P_U(r_i)\right) \\&= \underbrace{\sum _{i=1}^{n} \hat{z}_{i}^{(s)}\log (\pi _{i})+ (1-\hat{z}_{i}^{(s)})(\log (1-\pi _{i}))}_{M_{1}} \\&\quad + \underbrace{\sum _{i=1}^{n} \hat{z}_{i}^{(s)}\log (P_M(r_i \vert \varvec{x}_{i},\varvec{\theta }))+(1-\hat{z}_{i}^{(s)})\log (P_U(r_i))}_{M_{2}}. \end{aligned}$$

Thus, for given \(\varvec{\theta }^{(s)}\) one computes in the E-step the weights \(\hat{z}_{i}^{(s)}\) and in the M-step maximizes \(M(\varvec{\theta }|\varvec{\theta }^{(s)})\) (or rather \(M_{1}\) and \(M_{2}\)). If the mixture probabilities do not depend on covariates, that is, \(\pi _i=\pi \), one obtains

$$\begin{aligned} \pi ^{(s+1)}= \frac{1}{n}\sum _{i=1}^{n}\hat{z}_{i}^{(s)}\quad \text {and} \quad \varvec{\theta }^{(s+1)}= {\text {argmax}}_{\varvec{\theta }}\sum _{i=1}^{n}\hat{z}_{i}^{(s)}\log (P_M(r_i|\varvec{x}_{i},\varvec{\theta })). \end{aligned}$$

The E- and M-steps are repeated alternatingly until the difference \(L(\varvec{\theta }^{(s+1)})-L(\varvec{\theta }^{(s)})\) is small enough to assume convergence. Computation of \(\varvec{\theta }^{(s+1)}\) can be based on familiar maximization tools, because one maximizes a weighted log-likelihood of an ordinal model with known weights. In the case where only intercepts are component-specific, the derivatives are very similar to the score function used in a Gauss-Hermite quadrature and a similar EM algorithm applies with an additional calculation of the mixing distribution (see Aitkin 1999).

Dempster et al. (1977) showed that under weak conditions the EM algorithm finds a local maximum of the likelihood function \(L(\varvec{\theta })\). Hence it is sensible to use different start values \(\varvec{\theta }^{(0)}\) to find the solution of the maximization problem.

If covariates determine the probability that observation i belongs to the first mixture component in the form of a logit model, \(\pi _{i}({\varvec{\beta }})=1/(1+\exp ({-\varvec{z}_{i}^{T}{\varvec{\beta }}}))\), \(M_{1}\) is the weighted log-likelihood of a binary logit model. Then \(M_{1}\) and \(M_{2}\) are maximized separately to obtain the next iteration. The simple update \(\pi ^{(s+1)}= \sum _{i=1}^{n}\hat{z}_{i}^{(s)}/n\) is replaced by

$$\begin{aligned} {\varvec{\beta }}^{(s+1)}= {\text {argmax}}_{{\varvec{\beta }}}\sum _{i=1}^{n} \hat{z}_{i}^{(s)}\log (\pi _{i}({\varvec{\beta }}))+ (1-\hat{z}_{i}^{(s)})(\log (1-\pi _{i}({\varvec{\beta }}))). \end{aligned}$$

As default value for the stopping of the iterations we used the difference in two consecutive likelihoods; if it was below \(10^{-6}\) the algorithm was stopped.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tutz, G., Schneider, M., Iannario, M. et al. Mixture models for ordinal responses to account for uncertainty of choice. Adv Data Anal Classif 11, 281–305 (2017). https://doi.org/10.1007/s11634-016-0247-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-016-0247-9

Keywords

Mathematics Subject Classification

Navigation