Mixture models for ordinal responses to account for uncertainty of choice

Tutz, Gerhard; Schneider, Micha; Iannario, Maria; Piccolo, Domenico

doi:10.1007/s11634-016-0247-9

Mixture models for ordinal responses to account for uncertainty of choice

Regular Article
Published: 06 May 2016

Volume 11, pages 281–305, (2017)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Gerhard Tutz¹,
Micha Schneider¹,
Maria Iannario² &
…
Domenico Piccolo²

584 Accesses
20 Citations
Explore all metrics

Abstract

In CUB models the uncertainty of choice is explicitly modelled as a Combination of discrete Uniform and shifted Binomial random variables. The basic concept to model the response as a mixture of a deliberate choice of a response category and an uncertainty component that is represented by a uniform distribution on the response categories is extended to a much wider class of models. The deliberate choice can in particular be determined by classical ordinal response models as the cumulative and adjacent categories model. Then one obtains the traditional and flexible models as special cases when the uncertainty component is irrelevant. It is shown that the effect of explanatory variables is underestimated if the uncertainty component is neglected in a cumulative type mixture model. Visualization tools for the effects of variables are proposed and the modelling strategies are evaluated by use of real data sets. It is demonstrated that the extended class of models frequently yields better fit than classical ordinal response models without an uncertainty component.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment

Article Open access 01 March 2024

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Article 04 June 2018

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

References

Agresti A (2010) Analysis of ordinal categorical data, 2nd edn. Wiley, New York
Book MATH Google Scholar
Agresti A (2013) Categorical data analysis, 3d edn. Wiley, New York
MATH Google Scholar
Aitkin M (1999) A general maximum likelihood analysis of variance components in generalized linear models. Biometrics 55:117–128
Article MathSciNet MATH Google Scholar
Anderson JA (1984) Regression and ordered categorical variables. J Royal Stat Soc B 46:1–30
MathSciNet MATH Google Scholar
Böhning D, Dietz E, Schaub R, Schlattmann P, Lindsay BG (1994) The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann Inst Stat Math 46:373–388
Article MATH Google Scholar
Brant R (1990) Assessing proportionality in the proportional odds model for ordinal logistic regression. Biometrics 46:1171–1178
Article Google Scholar
Breen R, Luijkx R (2010) Mixture models for ordinal data. Sociol Methods Res 39:3–24
Article MathSciNet Google Scholar
Caffo B, An M-W, Rhode C (2007) Flexible random intercept models for binary outcomes using mixtures of normals. Comp Stat Data Anal 51:5220–5235
Article MathSciNet MATH Google Scholar
Cox C (1995) Location-scale cumulative odds models for ordinal data: A generalized non-linear model approach. Stat Med 14:1191–1203
Article Google Scholar
D’Elia A, Piccolo D (2005) A mixture model for preference data analysis. Comp Stat Data Anal 49:917–934
Article MathSciNet MATH Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc B 39:1–38
MathSciNet MATH Google Scholar
Efron B, Tibshirani RJ (1994) An introduction to the bootstrap, vol 57. CRC Press, London
MATH Google Scholar
Everitt BS (1988) A finite mixture model for the clustering of mixed-mode data. Stat Prob Lett 6(5):305–309
Article MathSciNet Google Scholar
Fahrmeir L, Tutz G (2001) Multivariate statistical modelling based on generalized linear models. Springer, New York
Book MATH Google Scholar
Follmann DA, Lambert D (1991) Identifiability of finite mixtures of logistic regression models. J Stat Plan Infer 27(3):375–381
Article MathSciNet MATH Google Scholar
Gambacorta R, Iannario M (2013) Measuring job satisfaction with CUB models. Labour 27(2):198–224
Article Google Scholar
Gertheiss J, Tutz G (2009) Penalized Regression with Ordinal Predictors. Int Stat Rev 77:345–365
Article Google Scholar
Gneiting T, Raftery A (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102:359–376
Article MathSciNet MATH Google Scholar
Greene W, Hensher D (2003) A latent class model for discrete choice analysis: contrasts with mixed logit. Trans Res Part B 39:681–689
Article Google Scholar
Grilli L, Iannario M, Piccolo D, Rampichini C (2014) Latent class CUB models. Adv Data Anal Class 8(1):105–119
Article MathSciNet Google Scholar
Grün B, Leisch F (2008) Identifiability of finite mixtures of multinomial logit models with varying and fixed effects. J Class 25:225–247
Article MathSciNet MATH Google Scholar
Iannario M (2010) On the identifiability of a mixture model for ordinal data. Metron 68(1):87–94
Article MathSciNet MATH Google Scholar
Iannario M (2012a) Hierarchical CUB models for ordinal variables. Commun Stat Theory Methods 41:3110–3125
Article MathSciNet MATH Google Scholar
Iannario M (2012b) Modelling shelter choices in a class of mixture models for ordinal responses. Stat Methods Appl 21:1–22
Article MathSciNet MATH Google Scholar
Iannario M (2012c) Preliminary estimators for a mixture model of ordinal data. Adv Data Anal Class 6:163–184
Article MathSciNet MATH Google Scholar
Iannario M, Piccolo D (2010) Statistical modelling of subjective survival probabilities. Genus 66:17–42
Google Scholar
Iannario M, Piccolo D (2012) CUB models: Statistical methods and empirical evidence. In: Kennett SSR (ed) Modern analysis of customer surveys: with applications using R. Wiley, New York, pp 231–258
Leroux BG (1992) Consistent estimation of a mixing distribution. Ann Stat 20:1350–1360
Article MathSciNet MATH Google Scholar
Liu Q, Agresti A (2005) The analysis of ordinal categorical data: An overview and a survey of recent developments. Test 14:1–73
Article MathSciNet MATH Google Scholar
Manisera M, Zuccolotto P (2014) Modeling rating data with nonlinear CUB models. Comp Stat Data Anal 78:100–118
Article MathSciNet Google Scholar
McCullagh P (1980) Regression model for ordinal data (with discussion). J Royal Stat Soc B 42:109–127
MathSciNet MATH Google Scholar
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Book MATH Google Scholar
Mehta CR, Patel NR, Tsiatis AA (1984) Exact significance testing to establish treatment equivalence with ordered categorical data. Biometrics 40:819–825
Article MathSciNet MATH Google Scholar
Nair VN (1987) Chi-squared-type tests for ordered alternatives in contingency tables. J Am Stat Assoc 82:283–291
Article MathSciNet MATH Google Scholar
Peterson B, Harrell FE (1990) Partial proportional odds models for ordinal response variables. Appl Stat 39:205–217
Article MATH Google Scholar
Piccolo D (2003) On the moments of a mixture of uniform and shifted binomial random variables. Quaderni di Stat 5:85–104
Google Scholar
Piccolo D (2006) Observed information matrix in MUB models. Quaderni di Stat 8:33–78
Google Scholar
Tutz G (2012) Regression for categorical data. Cambridge University Press, Cambridge
MATH Google Scholar
Tutz G, Gertheiss (2014) Rating scales as predictors—the old question of scale level and some answers. Psychometrika 79:357–376
Tutz G, Schauberger G (2013) Visualization of categorical response models - from data glyphs to parameter glyphs. J Comp Graph Stat 22(1):156–177
Article MathSciNet Google Scholar
Wedel M, DeSarbo W (1995) A mixture likelihood approach for generalized linear models. J Class 12:21–55
Article MATH Google Scholar

Download references

Acknowledgments

This work has been partially supported by FIRB2012 project (Code RBFR12SHVV) at University of Perugia and the frame of Programme STAR (CUP E68C13000020003) at University of Naples Federico II, financially supported by UniNA and Compagnia di San Paolo. ISFOL survey data has been used under the agreement ISFOL/PLUS 2006/430.

Author information

Authors and Affiliations

Ludwig-Maximilians-Universität München, Akademiestraße 1, 80799, Munich, Germany
Gerhard Tutz & Micha Schneider
University of Naples Federico II, Via L.Rodinò 22, 80138, Naples, Italy
Maria Iannario & Domenico Piccolo

Authors

Gerhard Tutz
View author publications
You can also search for this author in PubMed Google Scholar
Micha Schneider
View author publications
You can also search for this author in PubMed Google Scholar
Maria Iannario
View author publications
You can also search for this author in PubMed Google Scholar
Domenico Piccolo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gerhard Tutz.

Appendix

1.1 Identifiability

We assume that the number of categories is greater than 2 ($k>2$) and that there is an effect of a continuous covariate x, that is $\gamma \ne 0$. Let the CUP model with the cumulative logit model in the preference part be represented by two parameterizations, that is, for all x and r one has

$$\begin{aligned} \pi F(\gamma _{0r}+x\gamma ) + (1-\pi )r/k = \tilde{\pi }F(\tilde{\gamma }_{0r}+x\tilde{\gamma }) + (1-\tilde{\pi })r/k. \end{aligned}$$

There are values $\Delta _{0r}, \Delta $ such that $\tilde{\gamma }_{0r}=\gamma _{0r}+\Delta _{0r}$, $\tilde{\gamma }=\gamma +\Delta $. With $\eta _r(x)=\gamma _{0r}+x\gamma $ one obtains for all x and r

$$\begin{aligned} \pi F(\eta _r(x)) - \tilde{\pi }F(\eta _r(x)+\Delta _{0r}+ x\Delta ) = (\pi -\tilde{\pi })r/k. \end{aligned}$$

Let us consider now the specific values $x_z= - \gamma _{0r}/\gamma + z/\gamma $ yielding for all values z and r

$$\begin{aligned} \pi F(z) - \tilde{\pi }F(z+\Delta _{0r}+ x_z\Delta ) = (\pi -\tilde{\pi })r/k. \end{aligned}$$

By building the difference between these equations for values z and $z-1$ one obtains for all values z

$$\begin{aligned} \pi (F(z) - F(z-1)) = \tilde{\pi }(F(z+\Delta _{0r}+ x_z\Delta ) - F(z-1+\Delta _{0r}+ x_z\Delta )). \end{aligned}$$

(6)

The equation has to hold in particular for values $z=1,2,\ldots $. Since the logistic distribution function $F(\eta )= \exp (\eta )/(1+\exp (\eta ))$ is strictly monotonic and the derivative $F'(\eta )= \exp (\eta )/(1+\exp (\eta )^2)$ is different for all values $\eta $ it follows that $\Delta _{0r}=\Delta =0$ and $\pi =\tilde{\pi }$.

If the support of the covariate is finite one can consider different z- values. If $x \in [l,u]$ ($\gamma $ positive) one considers the transformed values $z_i= \gamma l+\gamma _{0r}+\gamma (u-l)i/M$, for $i=1,\dots ,M$, where M is any natural number. Then for all transformed values $x_{z_i}= - \gamma _{0r}/\gamma + z_i/\gamma $ one has $x_{z_i} \in [l,u]$. Thus, Eq. (6) has to hold for M different values $z_i$. Since M can be any natural number the same argument as before yields $\Delta _{0r}=\Delta =0$ and $\pi =\tilde{\pi }$.

1.2 Estimation

The general CUP model is determined by the probability

$$\begin{aligned} P(r_{i}|\varvec{x}_i)= \pi _i P_M(r_{i}|\varvec{x}_i) + (1- \pi _i)P_U(r_{i}), \end{aligned}$$

where the first mixture component follows an ordinal model and the second represents the discrete uniform distribution.

For given data $(r_{i},\varvec{x}_{i})$, $i=1,\dots ,n$, and collecting all parameters of the ordinal model used in the first mixture component in the parameter $\varvec{\theta }$, the log-likelihood to be maximized is

$$\begin{aligned} l(\varvec{\theta })=\sum _{i=1}^n \log (\pi _i P_M(r_i|\varvec{x}_i) + (1- \pi _i)P_U(r_i)). \end{aligned}$$

The usual way to obtain estimates is to consider it as a problem with incomplete data and solve the maximization problem by using the EM algorithm. Therefore, let $z_{i}$ denote the unknown mixture components with $z_{i}=1$ indicating that observation i is from the first mixture component, $z_{i}=0$ indicates that it is from the second mixture component. Then the complete density for $(r_i,z_i)$ is

$$\begin{aligned} P(r_i, z_i \vert \varvec{x}_{i}, \varvec{\theta })= P(r_i \vert z_{i}, \varvec{x}_{i}, \varvec{\theta })P( z_{i} )=P_M(r_i|\varvec{x}_i)^{z_i}P_U(r_i)^{z_i-1}\pi _i^{z_i}(1-\pi _i)^{z_i-1} \end{aligned}$$

yielding the complete log-likelihood

$$\begin{aligned} l_c(\varvec{\theta })&= \sum _{i=1}^{n} \log (P(r_i, z_i \vert \varvec{x}_{i}, \varvec{\theta }))\\&= \sum _{i=1}^{n} z_i (\log (P_M(r_i|\varvec{x}_i))+ \log (\pi _i))+ (1-z_i)(\log (P_U(r_i))+ \log (1-\pi _i)). \end{aligned}$$

The EM algorithm treats $z_{i}$ as missing data and maximizes the log-likelihood iteratively by using an expectation and a maximization step. During the E-step the conditional expectation of the complete log-likelihood given the observed data $\varvec{r}$ and the current estimate $\varvec{\theta }^{(s)}$,

$$\begin{aligned} M(\varvec{\theta }|\varvec{\theta }^{(s)})={\text {E}}\left( l_c(\varvec{\theta })|\varvec{r}, \varvec{\theta }^{(s)}\right) \end{aligned}$$

has to be computed. Because $l_c(\varvec{\theta })$ is linear in the unobservable data $z_{i}$, it is only necessary to estimate the current conditional expectation of $z_{i}$. From Bayes’s theorem follows

$$\begin{aligned} E(z_{i}|\varvec{y},\varvec{\theta })&= P\left( z_{i}=1| r_i,\varvec{x}_{i},\varvec{\theta }\right) \\&=P\left( r_i|z_{i}=1,\varvec{x}_{i},\varvec{\theta }\right) P(z_{i}=1|\varvec{x}_{i},\varvec{\theta })/P(r_i|\varvec{x}_{i},\varvec{\theta }) \\&=\pi _i P_M(r_i|\varvec{x}_{i},\varvec{\theta })/P( r_i|\varvec{x}_{i},\varvec{\theta })= \hat{z}_{i}. \end{aligned}$$

This is the posterior probability that the observation $r_i$ belongs to the first component of the mixture. For the s-th iteration one obtains

$$\begin{aligned} M(\varvec{\theta }|\varvec{\theta }^{(s)})&= \sum _{i=1}^{n} \hat{z}_{i}^{(s)}\left( \log (\pi _{i})+ \log (P_M(r_i \vert \varvec{x}_{i},\varvec{\theta })\right) \\&\quad +(1-\hat{z}_{i}^{(s)})\left( \log (1-\pi _{i})+ \log (P_U(r_i)\right) \\&= \underbrace{\sum _{i=1}^{n} \hat{z}_{i}^{(s)}\log (\pi _{i})+ (1-\hat{z}_{i}^{(s)})(\log (1-\pi _{i}))}_{M_{1}} \\&\quad + \underbrace{\sum _{i=1}^{n} \hat{z}_{i}^{(s)}\log (P_M(r_i \vert \varvec{x}_{i},\varvec{\theta }))+(1-\hat{z}_{i}^{(s)})\log (P_U(r_i))}_{M_{2}}. \end{aligned}$$

Thus, for given $\varvec{\theta }^{(s)}$ one computes in the E-step the weights $\hat{z}_{i}^{(s)}$ and in the M-step maximizes $M(\varvec{\theta }|\varvec{\theta }^{(s)})$ (or rather $M_{1}$ and $M_{2}$). If the mixture probabilities do not depend on covariates, that is, $\pi _i=\pi $, one obtains

$$\begin{aligned} \pi ^{(s+1)}= \frac{1}{n}\sum _{i=1}^{n}\hat{z}_{i}^{(s)}\quad \text {and} \quad \varvec{\theta }^{(s+1)}= {\text {argmax}}_{\varvec{\theta }}\sum _{i=1}^{n}\hat{z}_{i}^{(s)}\log (P_M(r_i|\varvec{x}_{i},\varvec{\theta })). \end{aligned}$$

The E- and M-steps are repeated alternatingly until the difference $L(\varvec{\theta }^{(s+1)})-L(\varvec{\theta }^{(s)})$ is small enough to assume convergence. Computation of $\varvec{\theta }^{(s+1)}$ can be based on familiar maximization tools, because one maximizes a weighted log-likelihood of an ordinal model with known weights. In the case where only intercepts are component-specific, the derivatives are very similar to the score function used in a Gauss-Hermite quadrature and a similar EM algorithm applies with an additional calculation of the mixing distribution (see Aitkin 1999).

Dempster et al. (1977) showed that under weak conditions the EM algorithm finds a local maximum of the likelihood function $L(\varvec{\theta })$. Hence it is sensible to use different start values $\varvec{\theta }^{(0)}$ to find the solution of the maximization problem.

If covariates determine the probability that observation i belongs to the first mixture component in the form of a logit model, $\pi _{i}({\varvec{\beta }})=1/(1+\exp ({-\varvec{z}_{i}^{T}{\varvec{\beta }}}))$, $M_{1}$ is the weighted log-likelihood of a binary logit model. Then $M_{1}$ and $M_{2}$ are maximized separately to obtain the next iteration. The simple update $\pi ^{(s+1)}= \sum _{i=1}^{n}\hat{z}_{i}^{(s)}/n$ is replaced by

$$\begin{aligned} {\varvec{\beta }}^{(s+1)}= {\text {argmax}}_{{\varvec{\beta }}}\sum _{i=1}^{n} \hat{z}_{i}^{(s)}\log (\pi _{i}({\varvec{\beta }}))+ (1-\hat{z}_{i}^{(s)})(\log (1-\pi _{i}({\varvec{\beta }}))). \end{aligned}$$

As default value for the stopping of the iterations we used the difference in two consecutive likelihoods; if it was below $10^{-6}$ the algorithm was stopped.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tutz, G., Schneider, M., Iannario, M. et al. Mixture models for ordinal responses to account for uncertainty of choice. Adv Data Anal Classif 11, 281–305 (2017). https://doi.org/10.1007/s11634-016-0247-9

Download citation

Received: 16 December 2014
Revised: 14 April 2016
Accepted: 20 April 2016
Published: 06 May 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s11634-016-0247-9

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mixture models for ordinal responses to account for uncertainty of choice

Abstract

Access this article

Similar content being viewed by others

Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Identifiability

1.2 Estimation

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Mixture models for ordinal responses to account for uncertainty of choice

Abstract

Access this article

Similar content being viewed by others

Recognize the Value of the Sum Score, Psychometrics’ Greatest Accomplishment

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Identifiability

1.2 Estimation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation