Skip to main content

Treatment of ‘don’t know’ responses in a mixture model for rating data

Abstract

In recent years the use of questionnaires to investigate and measure human perceptions has hugely intensified and, correspondingly, there has been an increased need for statistical models able to treat ordered categorical data, that usually derive from questions asking for ratings. In this paper we focus on a specific class of models, called Combination of Uniform and shifted Binomial (CUB), based on the assumption that rating data derive from an unconscious decision process composed of two independent components, called feeling and uncertainty. More precisely, we deal with a recently proposed extension in this context, the Nonlinear CUB model, which beyond being able to measure feeling and uncertainty, gives an idea of the state of mind of respondents toward the scale used to express the ratings. The aim of the paper is to show how the parameters of the Nonlinear CUB model can be adjusted in order to take account of the presence of ‘don’t know’ responses, following a recent idea developed in the CUB context. In addition, a graphical representation is proposed, able to summarize all the results in a unique graph. A case study is presented, concerned with data from the Eurobarometer survey.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  1. Agresti, A.: Categorical Data Analysis, 3rd edn. Wiley, New York (2013)

    MATH  Google Scholar 

  2. Bacci, S., Bartolucci, F.: A multidimensional finite mixture structural equation model for nonignorable missing responses to test items. Struct. Equ. Model. 00, 1–14 (2015)

    MathSciNet  Google Scholar 

  3. Balirano, G., Corduas, M.: Detecting semiotically expressed humor in diasporic tv productions. Humor 3, 227–251 (2008)

    Google Scholar 

  4. Beatty, P., Herrmann, D.: A framework for evaluating “don’t know” responses in surveys. In: Proceedings of the Section on Survey Research Methods. American Statistical Association, pp. 1005–1010 (1995)

  5. Bishop, G.F., Tuchfarber, A.J., Oldendick, R.W.: Opinions on fictitious issues: the pressure to answer survey questions. Publ. Opin. Q. 50, 240–250 (1986)

    Article  Google Scholar 

  6. Capecchi, S., Piccolo, D.: Modelling the latent components of personal happiness. In: Perna, C., Sibillo, M. (eds.) Mathematical and Statistical Methods for Actuarial Sciences and Finance. Springer, Berlin, pp. 49–52 (2014)

  7. Corduas, M., Iannario, M., Piccolo, D.: A class of statistical models for evaluating services and performances. In: Monari, P., Bini, M., Piccolo, D., Salmaso, L. (eds.) Statistical Methods for the Evaluation of Educational Services and Quality of products. Springer, Berlin, pp. 99–117 (2009)

  8. D’Elia, A.: A statistical modelling approach for the analysis of tmd chronic pain data. Stat. Methods Med. Res. 17, 389–403 (2008)

    MathSciNet  Article  MATH  Google Scholar 

  9. D’Elia, A., Piccolo, D.: A mixture model for preference data analysis. Comput. Stat. Data Anal. 49, 917–934 (2005)

    MathSciNet  Article  MATH  Google Scholar 

  10. Gambacorta, R., Iannario, M.: Measuring job satisfaction with CUB models. Labour 27, 198–224 (2013)

    Article  Google Scholar 

  11. Gambacorta, R., Iannario, M., Vallian, R.: Design-based inference in a mixture model for ordinal variables for a two stage stratied design. Aust. N. Z. J. Stat. 56, 125–143 (2014)

    MathSciNet  Article  Google Scholar 

  12. Grilli, L., Iannario, M., Piccolo, D., Rampichini, C.: Latent class CUB models. Adv. Data Anal. Classif. 8, 105–119 (2013)

    MathSciNet  Article  Google Scholar 

  13. Iannario, M.: On the identifiability of a mixture model for ordinal data. Metron LXVIII, pp. 87–94 (2010)

  14. Iannario, M.: Hierarchical CUB models for ordinal variables. Commun. Stat.-Theor. Methods 41, 3110–3125 (2012a)

    MathSciNet  Article  MATH  Google Scholar 

  15. Iannario, M.: Modelling shelter choices in a class of mixture models for ordinal responses. Stat. Method Appl. 20, 1–22 (2012b)

    MathSciNet  Article  MATH  Google Scholar 

  16. Iannario, M.: CUBE models for interpreting ordered categorical data with overdispersion. Quad. Stat. 14, 137–140 (2012c)

    Google Scholar 

  17. Iannario, M.: Modelling uncertainty and overdispersion in ordinal data. Commun. Stat.-Theor. Methods 43, 771–786 (2014)

    MathSciNet  Article  MATH  Google Scholar 

  18. Iannario, M., Manisera, M., Piccolo, D., Zuccolotto, P.: Sensory analysis in the food industry as a tool for marketing decisions. Adv. Data Anal. Classif. 6, 303–321 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  19. Iannario, M., Piccolo, D.: A new statistical model for the analysis of customer satisfaction. Qual. Technol. Quant. Manag. 7, 149–168 (2010a)

    Google Scholar 

  20. Iannario, M., Piccolo, D.: Statistical modelling of subjective survival probabilities. GENUS LXVI, 17–42 (2010b)

  21. Iannario, M., Piccolo, D.: CUB models: statistical methods and empirical evidence. In: Kenett, R.S., Salini, S. (eds.) Modern Analysis of Customer Surveys, pp. 231–258. Wiley, New York (2012)

    Google Scholar 

  22. Iannario, M., Piccolo, D.: A generalized framework for modelling ordinal data. Stat. Method Appl. (2015). doi:10.1007/s10260-015-0316-9

  23. Lord, F.: Maximum likelihood estimation of item response parameter when some responses are omitted. Psychometrika 48, 477–482 (1983)

    MathSciNet  Article  MATH  Google Scholar 

  24. Manisera, M., Zuccolotto, P.: Nonlinear CUB models: some stylized facts. QdS—J. Methodol. Appl. Stat. 1–2 (2013)

  25. Manisera, M., Zuccolotto, P.: Modeling “don’t know” responses in rating scales. Pattern Recogn. Lett. 45, 226–234 (2014a)

    Article  Google Scholar 

  26. Manisera, M., Zuccolotto, P.: Modeling rating data with nonlinear CUB models. Comput. Stat. Data Anal. 78, 100–118 (2014b)

    MathSciNet  Article  Google Scholar 

  27. Manisera, M., Zuccolotto, P.: Numerical optimization and EM algorithm in a mixture model for human perceptions analysis. Working paper (2015a)

  28. Manisera, M., Zuccolotto, P.: On the identifiability of Nonlinear CUB models. J. Multivar. Anal. 140, 302–316 (2015b)

  29. Oberski, D., Vermunt, J.: The CUB model and its variations are restricted loglinear latent class models. EJASA (2015). http://daob.nl/wp-content/papercite-data/pdf/oberski-wp-cub-is-lcm.pdffrom

  30. Piccolo, D.: On the moments of a mixture of Uniform and shifted Binomial random variables. Quad. Stat. 5, 85–104 (2003)

    Google Scholar 

  31. Piccolo, D.: Observed information matrix for MUB models. Quad. Stat. 8, 33–78 (2006)

    Google Scholar 

  32. Piccolo, D.: Inferential issues on CUBE models with covariates. Commun Stat-Theor M 43 (2014). doi:10.1080/03610926.2013.821487

  33. Piccolo, D., D’Elia, A.: A new approach for modelling consumers’ preferences. Food Qual. Prefer. 19, 247–259 (2008)

    Article  Google Scholar 

  34. Tutz, G.: Regression for Categorical Data. Cambridge University Press, Cambridge (2012)

    MATH  Google Scholar 

  35. Tutz, G., Schneider, M., Iannario, M., Piccolo, D.: Mixture models for ordinal responses to account for uncertainty of choice. Technical Report 175. Department of Statistics, University of Munich (2014)

Download references

Acknowledgments

This research was partially funded by STAR project (University of Naples Federico II—CUP: E68C13000020003) and partially by a grant from the European Union Seventh Framework Programme (FP7-SSH/2007-2013); SYRTO—Project ID: 320270.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marica Manisera.

Appendix

Appendix

The aim of this Appendix is to clarify the rationale underlying the CUB and NLCUB models as well as some of their methodological issues, in order to allow a better understanding of the proposal in the paper. A number of issues and examples already published in previous papers [24, 26]] are recalled to improve the readability of the paper.

In [26] the NLCUB formulation is derived as a special case of a more general setting, formally describing the Decision Process (DP) driving individuals’ responses to questions with ordered response levels. According to this general model, two different approaches, called (borrowing the CUB terminology) feeling and uncertainty approach, coexist in the DP and can determine the final answer. This happens unconsciously in the respondents’ mind.

The feeling approach accounts for any reasoned assessment and the set of emotions, sentiments and perceptions logically connected with the object being evaluated. It is assumed to proceed trough T consecutive steps, called feeling path. The idea is that, during his/her reasoning, the respondent makes a screening of all the positive and negative sensations randomly coming to his/her mind. Each new sensation is a step of the feeling path: at each step t, the respondent gives a very simple evaluation (e.g. positive or negative?) of the new sensation (basic judgment), summarizes the current and the previous basic judgments and transforms them into a rating \(r_t\) in the required scale (provisional rating), which is then progressively updated in the following steps of the feeling path. The rating generated by the feeling approach (when all the positive and negative sensations have been taken into account) is the last provisional rating \(r_T\).

On the other hand, the uncertainty approach derives from elements other than those accounted for in the reasoning of the feeling path, such as, for example, the unconscious willingness to delight the interviewer or the indecision deriving from the absence of a definite opinion on the object under evaluation. It just consists of a completely random judgment.

The expressed rating can derive from the feeling or the uncertainty approach with given probabilities. In this framework, a statistical model can be built by setting assumptions about (1) the distribution of basic judgments, (2) the function used to accumulate them and (3) the function used to transform them into Likert-scaled ratings. In fact, both CUB and NLCUB models derive from specific hypotheses on the former three points.

We briefly recall two numerical examples given in [26], showing how this DP leads to both CUB and NLCUB models.

Example 1

DP of CUB models: Suppose a person is asked to express a judgment about his/her satisfaction with a product by using a Likert scale from 1 to \(m=5\). In the feeling approach, the respondent asks him/herself for \(T=m-1=4\) times ‘Do I have a positive sensation about this product? Yes or no?’ and gives a quick and instinctive response each time. The probability of answering ‘Yes’ is \(1-\xi \). In the end, 1 plus the total number of ‘Yes’ responses is the last rating \(r_T\) of the feeling path (Table 2). In the uncertainty approach, for a wide variety of reasons, the rating is drawn from a discrete Uniform distribution in (\(1,2,\ldots ,5\)). The expressed rating can be formulated by the feeling or the uncertainty approach with probabilities \(\pi \) and \(1-\pi \), respectively. The distribution of the ratings assumed by the CUB models and reported in (1) in Sect. 2 is consistently derived from this unconscious mechanism:

$$\begin{aligned} Pr(R=r;\mathbf{\theta })=\pi Pr(V(m,\xi )=r) + (1-\pi ) Pr(U(m)=r) \end{aligned}$$

for \(r=1,\ldots ,m\), with \(\mathbf{\theta }=(\pi ,\xi )'\), \(\pi \in (0,1]\), \(\xi \in [0,1]\).

Table 2 DP of CUB models—feeling approach (example with \(m=5\))
Table 3 DP of NLCUB models—feeling approach (example with \(m=5\) and \(T=8\))

Example 2

DP of Nonlinear CUB models: As in Example 1, suppose a person is asked to express a judgment about his/her satisfaction with a product by using a Likert scale from 1 to \(m=5\). In the feeling approach, the person unconsciously asks him/herself for \(T>m-1\) times ‘Do I have a positive sensation about this product? Yes or no?’ and gives a quick and instinctive response each time. The probability of answering ‘Yes’ is \(1-\xi \). For example, let T be equal to 8 (this value is unconsciously determined in the respondent mind). In the end, the last rating of the feeling path is still based on the total number of ‘Yes’ responses, as in Example 1, but in an ‘asymmetric’ way. So, for example, zero ‘Yes’ responses can lead to the rating \(r_T=1\); one or two ‘Yes’ responses to the rating \(r_T=2\); three, four, five or six ‘Yes’ responses to the rating \(r_T=3\); seven ‘Yes’ responses to the rating \(r_T=4\) and, finally, eight ‘Yes’ responses to the highest rating \(r_T=5\) (Table 3). In other words, according to this DP, the respondent feels more difficult moving from rating 3 to rating 4 than moving from rating 1 to rating 2. As in Example 1, the final response can be \(r_T\) or a random rating resulting from the uncertainty approach, with probabilities \(\pi \) and \(1-\pi \), respectively. It should be evident that the distribution of the ratings assumed by the NLCUB models (see (2) in Sect. 2) is consistent with this mechanism:

$$\begin{aligned} Pr(R\,{=}\,r|k_0,k_1,\ldots ,k_m;\mathbf{\theta })\,{=}\,\pi \sum _{v=k_{r-1}+1}^{k_{r}} Pr(V(T+1,\xi )\,{=}\,v) + (1-\pi ) Pr(U(m)\,{=}\,r). \end{aligned}$$

It is worth recalling that \(V(T+1,\xi )\) is the Shifted Binomial random variable with trial parameter \(T+1\) and success probability \(1-\xi \), which corresponds to a Binomial random variable with trial parameter T, defined over the ‘shifted’ support \(\{1,\ldots ,T+1\}\) instead of \(\{0,\ldots ,T\}\). In this example, \(k_0=0\), \(k_1=1\), \(k_2=3\), \(k_3=7\), \(k_4=8\), and \(k_5=T+1=9\). Moreover, we have \(g_1=1\), \(g_2=2\), \(g_3=4\), \(g_4=1\), and \(g_5=1\) (while in Example 1 we have \(T=m-1=4\) and \(g_s=1\) for all \(s=1,\ldots ,5\)). So, in NLCUB, for \(s=1,\ldots ,m-1\), each parameter \(g_s\) represents the number of positive basic judgments that have to be accumulated in order to move from rating s to \(s+1\). In addition, it is easy to verify that \(T=8=g_1+\cdots +g_m-1\). It is evident that the NLCUB models differ from CUB only in the feeling approach.

It is worth noting that the values of \(\mathbf{g}\) and, consequently, T are determined on the basis of the data, according to a model selection mechanism. In fact, we point out that the values in the vector \(\mathbf{g}\) determine the probability distribution of one of the two random variables in the mixture. In other words, \(\mathbf{g}\) does not parametrize a given distribution, but does define the distribution itself. In this sense, although its values \(g_1,\ldots , g_m\) have to be assessed on the basis of the data, \(\mathbf{g}\) should not be considered a parameter, but rather a constituent of the model. For this reason, parameter estimation of NLCUB models is performed with a two-step procedure [26]. In the first step, the Maximum Likelihood estimates are obtained for \(\pi \) and \(\xi \) for fixed \(\mathbf{g}\), that is considering all the NLCUB models obtainable with different vectors \(\mathbf{g}\) such that \(g_1+\cdots +g_m \le T_{max} + 1\), where \(T_{max}\) is the maximum value for T, fixed by the user ([24] suggest fixing \(T_{max}=2m-1\)). At the end of this step, there is one NLCUB model for each configuration of \(\mathbf{g}\), along with the corresponding ML estimates of \(\pi \) and \(\xi \). In the second step, the ‘best’ model is selected with a model selection procedure, which can be based on different optimality criteria. At the moment the basic method consists in choosing the model that maximizes the loglikelihood function. Since this is a crucial point, further research is currently devoted to this issue and several alternatives are being explored, dealing with the use of information criteria, such as BIC or AIC, the introduction of a penalization in the likelihood function, the evaluation of the model’s prediction performance on a test set with a cross-validation approach, or the development of a formal test.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Manisera, M., Zuccolotto, P. Treatment of ‘don’t know’ responses in a mixture model for rating data. METRON 74, 99–115 (2016). https://doi.org/10.1007/s40300-015-0075-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40300-015-0075-2

Keywords

  • Rating data
  • ‘Don’t know’ responses
  • Nonlinear CUB models
  • Eurobarometer
  • Missing values