Skip to main content


Log in

Analysis of the validity of the vignette approach to correct for heterogeneity in reporting health system responsiveness

  • Original Paper
  • Published:
The European Journal of Health Economics Aims and scope Submit manuscript


Despite the growing popularity of the vignette methodology to deal with self-reported, categorical data, the formal evaluation of the validity of this methodology is still a topic of research. Some critical assumptions need to hold in order for this method to be valid. In this paper we analyse the assumption of “vignette equivalence” using data on health system responsiveness contained within the World Health Survey. We perform several tests to check the assumption of vignette equivalence. First, we use a test based on the global ordering of the vignettes. A minimal condition for the assumption of vignette equivalence to hold is that individual responses are consistent with the global ordering of vignettes. Secondly, using the hierarchical ordered probit model (HOPIT) model on the pool of countries, we undertake sensitivity analyses, stratifying countries according to the Inglehart–Welzel scale and the Human Development Index. The results of this analysis are robust, suggesting that the vignette equivalence assumption is not contradicted. Thirdly, we model the reporting behaviour of the respondents through a two-step regression procedure to evaluate whether the vignettes construct is perceived by respondents in different ways. Overall, across the analyses the results do not contradict the assumption of vignette equivalence and accordingly lend support to the use of the vignette methodology when analysing self-reported data and health system responsiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others


  1. Other studies focus on the assumption of response consistency when trying to assess the validity of the anchoring vignettes methodology, i.e. [6, 15, 17].

  2. The long-form questionnaire uses two questions items per domain, while the short-form questionnaire uses only one. We use the eight items that are common to the long- and short-form questionnaire.

  3. This map has been utilised to assess the validity of the vignette equivalence assumption also by Kristensen and Johansson [11].

  4. “Self Secular” = Austria, Belgium, Denmark, Germany, Spain, Finland, France, Great Britain, Greece, Israel, Italy, Luxemburg, Netherlands, Slovenia, Sweden. “Self-Traditional” = Brazil, Dominican Republic, Ecuador, Guatemala, Ireland, Portugal, Uruguay. “Survival-Traditional” = United Arab Emirates, Burkina Faso, Bangladesh, Chad, Cote d’Ivoire, Congo, Comoros, Ethiopia, Ghana, India, Kenya, Lao, Sri Lanka, Malaysia, Mauritania, Malay, Morocco, Myanmar, Mauritius, Malawi, Namibia, Nepal, Pakistan, Philippines, Senegal, Swaziland, Tunisia, South Africa, Zambia, Zimbabwe. “Survival Secular” = Bosnia, China, Croatia, Czech Republic, Georgia, Hungary, Kazakhstan, Latvia, Russia, Slovakia, Ukraine, Vietnam.

  5. For an example of consistent vignette ordering, consider Murray et al. [19], Fig. 30.3.

  6. The average is computed assigning the same weight to each country within a group.

  7. As an example “running a marathon” could be viewed as a multidimensional construct. Some individuals may view running a marathon as evidence of a high level of mobility and some as a result of exceptional talent. Others might consider it as an attribute related to health, whist others might as an attribute related to sport [19].

  8. Perfect agreement of the rankings leads to a coefficient of 1, perfect disagreement −1, and independence 0.

  9. We do not include in the analysis individuals who gave the same evaluation of all the vignettes (i.e. they judge all the vignettes as excellent responsiveness). Indeed, for these individuals it is not possible to compute the Spearman rank order correlation coefficient between their ranking and the global ordering ranking. However, we perform a robustness check including in the sample the observations about respondents who gave the same evaluation of all the vignettes. Referring to the domain “Confidentiality”, we perform the robustness check by just moving one vignette of one rank, in a consistent way with the global ordering. The results obtained including these observations are extremely similar to those not including them.

  10. The average SROCCs have been computed assuming equal weight for each individual.

  11. We exclude only Australia, Norway and Turkey since data on “Dignity” are not available for these countries.

  12. See pp. 463–464 of Kapteyn et al. [5] for a formal description of the model estimated by the authors.

  13. This set of vignettes is coded as Set A in the WHS. We are unable to perform our analysis on a pool of all the vignettes contained in the responsiveness module, since each set is evaluated by a different group of respondents.

  14. The first vignette of the set (q7501) is assumed to be the base category.

  15. The strategy adopted by STATA (the software we utilize for the empirical estimates) for identification in the ordered probit model is to set the constant term to zero. Therefore, we assume the coefficient of the base reference vignette-dummy to be equal to zero.

  16. Australia, Turkey and Guatemala are excluded from the analysis since data on vignettes are not reported for all the domains considered.

  17. The coefficient of variation of the number of alternative orderings is 14.35, while for the number of SROCCS that occur with a frequency greater than 1% it is 0.91.

  18. For each domain, we have computed the median SROCC on the basis of tables analogous to Table 4.

  19. We are not aware of any study that explicitly defines a threshold of acceptability for the rank order correlation coefficient above which we can assume that vignette equivalence holds. However, according to Murray et al. [19], a rank order correlation coefficient greater than 0.9 strongly corroborates the assumption of vignette equivalence.

  20. Only the result related to the first cut point in the reporting bias equations is reported in Tables 8 and 9. Results related to the other cut points are available on request.

  21. The results of the first and second step regression are available on request.

  22. The results are not affected by the distribution of the gender of individuals across vignettes, since both women and men are represented in vignettes describing high and low levels of responsiveness.


  1. Murray, C., Frenk, J.: A framework for assessing the performance of health systems. Bull. World Health Org. 78, 717–731 (2000)

    Google Scholar 

  2. Valentine, N., De Silva, A., Kawabata, K., Darby, C., Murray, C.J.L., Evans, D.: Health system responsiveness: concepts, domains and operationalization. In: Murray, C.J.L., Evans, D.B. (eds.) Health systems performance assessment: debates, methods and empiricism, pp. 573–596. World Health Organization, Geneva (2003)

    Google Scholar 

  3. Salomon, J., Tandon, A., Murray, C.J.L., World Health Survey Pilot Study Collaborating Group: Comparability of self-rated health: cross sectional multi-country survey using anchoring vignettes. Brit. Med. J. 328(258), 258–261 (2004)

    Article  Google Scholar 

  4. Bago d’Uva, T., van Doorlsaer, E., Lindeboom, M., O’Donnell, O.: Does reporting heterogeneity bias the measurement of health disparities? Health Econ. 17(3), 351–375 (2008)

    Article  Google Scholar 

  5. Kapteyn, A., Salomon, J., van Soest, A.: Vignettes and self-reports of work disability in the US and the Netherlands. AER 97(1), 461–473 (2007)

    Google Scholar 

  6. Van Soest, A., Delaney, A., Harmon, C., Kapteyn, A., Smith, J.P.: Validating the use of vignettes for subjective threshold scales. Discussion Paper, Tilburg University (2007)

  7. King, G., Murray, C.J.L., Salomon, J., Tandon, A.: Enhancing the validity and cross-cultural comparability of measurement in survey research. Am. Polit. Sci. Rev. 98(1), 184–191 (2004)

    Article  Google Scholar 

  8. Rice, N., Robone, S., Smith, P.C.: Analysis of the validity of the vignette approach to correct for heterogeneity in reporting health system responsiveness. HEDG Working Paper, University of York (2009)

  9. Rice, N., Robone, S., Smith, P.C.: International comparison of public sector performance: the use of anchoring vignettes to adjust self-reported data. Evaluation 16(1), 81–101 (2010)

    Google Scholar 

  10. Sirven, N., Santos-Eggimann, B., Spagnoli, J.: Comparability of health care responsiveness in Europe using anchoring vignettes from SHARE. IRDES working paper DT15 (2008)

  11. Kristensen, N., Johansson, E.: New evidence on cross-country differences in job satisfaction using anchoring vignettes. Labour Econ. 15, 96–117 (2008)

    Article  Google Scholar 

  12. Hsee, C.K., Tang, J.N.: Sun and water: on a modulus-based measurement of happiness. Emotion 7, 213–218 (2007)

    Article  Google Scholar 

  13. Javaras, K.N., Ripley, B.D.: An “unfolding” latent variable model for likert attitude data: drawing inferences adjusted for response style. JASA 102(478), 454–463 (2007)

    Google Scholar 

  14. Grzymala-Busse, A.: Rebuilding Levithan: party competition and state exploitation in post-communist democracies. Cambridge University Press, New York (2007)

    Book  Google Scholar 

  15. Bago d’Uva, T., Lindeboom, M., O’Donnell, O., van Doorslaer E.: Slipping anchor? Testing the vignettes approach to identification and correction of reporting heterogeneity. HEDG Working Paper, University of York (2009)

  16. Hopkins, D., King, G.: Improving anchoring vignettes: designing surveys to correct interpersonal incomparability. Public Opin. Q. (2010) (in press)

  17. Gupta N., Kristensen, N., Pozzoli, D.: External validation of the use of vignettes in cross-country health studies. IZA Discussion Paper (2008)

  18. Wand, J.: Credible comparisons using interpersonally incomparable data: ranking self-evaluations relative to anchoring vignettes or other common survey questions. Mimeo (2007)

  19. Murray, C.J.L., Ozaltin, E., Tandon, A.J., Salomon, J.: Empirical evaluation of the anchoring vignettes approach in health surveys. In: Murray, C.J.L., Evans, D.B. (eds.) Health systems performance assessment: debates, methods and empiricism, pp. 369–399. World Health Organization, Geneva (2003)

    Google Scholar 

  20. King, G., Wand, J.: Comparing incomparable survey responses: new tools for anchoring vignettes. Polit. Anal. 15(1, Winter), 46–66 (2007)

    Google Scholar 

  21. Kapteyn, A., Salomon J., van Soest, A.: Are Americans really less happy with their incomes? Rand Working Paper (2008)

  22. Üstün, T.B., Chatterji, S., Mechbal, A., Murray, C.: The world health surveys. In: Murray, C.J.L., Evans, D.B. (eds.) Health systems performance assessment: debates, methods and empiricism, pp. 762–796. World Health Organization, Geneva (2003)

    Google Scholar 

  23. Valentine, N., Prasat, A., Rice, N., Robone, S., Chatterji, S.: Health systems responsiveness—a measure of the acceptability of health care processes and systems. In: Mossialos, E., Smith, P., Leatherman, S. (eds.) Performance measurement for health system improvement: experiences, challenges and prospects, pp. 256–305. WHO European Regional Office, London (2009)

    Google Scholar 

  24. Ferguson, B.D., Tandon, A., Gakidou, E., Murray, C.J.L.: Estimating permanent income using indicator variables. In: Murray, C.J.L., Evans, D.B. (eds.) Health systems performance assessment: debates, methods and empiricism, pp. 748–760. World Health Organization, Geneva (2003)

    Google Scholar 

  25. United Nation Development Programme: Human Development Report. New York (2006)

  26. Inglehart R.: Inglehart–Welzel cultural map of the World.

  27. Tandon, A., Murray, C.J.L., Salomon, J.A., King, G.: Statistical models for enhancing cross-population comparability. In: Murray, C.J.L., Evans, D.B. (eds.) Health systems performance assessment: debates, methods and empiricism, pp. 727–746. World Health Organization, Geneva (2003)

    Google Scholar 

  28. Terza, J.V.: Ordinal probit: a generalization. Commun. Stat. 14(1), 1–11 (1985)

    Google Scholar 

  29. Lewis, J.B., Linzer, D.A.: Estimating regression models in which the dependent variable is based on estimates. Polit. Anal. 13, 345–364 (2005)

    Article  Google Scholar 

  30. Jones, A.M., Rice, N., D’Uva, T.B., Balia, S.: Applied health economics. Routledge, New York (2007)

    Google Scholar 

  31. Efron, B.: The Jackknife, the Bootstrap and other resampling plans. Society for Industrial and Applied Mathematics, Philadelphia (1982)

    Google Scholar 

Download references


This research was funded by the Economic and Social Research Council under the Public Services Programme, grant number RES-166-25-0038, and under the ESRC Large Grant Scheme, grant number RES-060-25-0045. We would like to thank the World Health Organization for providing access to the World Health Survey and, in particular, Somnath Chatterji, Amit Prasad, Nicole Valentine and Emese Verdes. We are also grateful to the Health, Econometrics and Data Group Seminar Series at the University of York and to the Health Economics Seminars, Erasmus University for helpful comments on an earlier draft.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Silvana Robone.

Appendix: The HOPIT model

Appendix: The HOPIT model

Reporting behaviour equation

To identify the thresholds as a function of respondent covariates, let \( R_{ik}^{v * } \) represent the underlying health system responsiveness for vignette k, rated by individual i. Given that each vignette is fixed and unrelated to a respondent’s characteristics, it is assumed that the expected value of the underlying latent scale depends solely on the corresponding vignette, such that:

$$ R_{ik}^{v * } = K_{ik} \eta_{k} + \varepsilon_{ik}^{v} ,\quad \varepsilon_{ik}^{ * } |K_{i} \sim N\left( {0,1} \right) $$

where \( K_{ik} \) is the vector of vignettes, \( \eta_{k} \) is a conformably dimensioned vector of parameters and \( \varepsilon_{ik}^{v} \)is an idiosyncratic error term. \( R_{ik}^{v * } \) is unobservable to the researcher and instead we observe the vignette rating, \( r_{ik}^{v} \) on a five-point scale ranging from ‘very bad’ to ‘very good’. We assume the observed category of \( r_{ik}^{v} \) is related to \( R_{ik}^{v * } \) through the following mechanism:

$$ r_{ik}^{v} = j\quad {\text{if}}\;\mu_{i}^{j - 1} \le R_{ik}^{v * } < \mu_{i}^{j} \quad {\text{for}}\;\mu_{i}^{0} = - \infty ,\;\mu_{i}^{5} = \infty ,\;\forall \,i,\,k;\quad j = 1, \ldots ,5 $$

Should the thresholds represent fixed constants, \( \mu^{j} \), common to all individuals, then the above mapping is common to the ordered probit model. For the HOPIT model the thresholds are assumed to be functions of covariates, X such that:

$$ \mu_{i}^{j} = X_{i} \gamma^{j} $$

where \( \mu_{i}^{j} ,\;j = 1, \ldots ,5 \) are parameters to be estimated along with \( \eta_{k} \). Further, we assume an ordering of the thresholds such that \( \mu_{i}^{1} < \mu_{i}^{2} < \cdots < \mu_{i}^{5} . \) If we impose the restriction that the covariates affect all thresholds by the same magnitude, then we have parallel cut-point shift. However, if the degree of reporting heterogeneity varies across thresholds such that it is greater at some levels of responsiveness than others, we refer to this as non-parallel shift [30].

Responsiveness equation

Underlying health system responsiveness faced by individual i can be expressed as:

$$ R_{i}^{s * } = Z_{i} \beta + \varepsilon_{i}^{s} ,\quad \varepsilon_{i}^{s} |Z_{i} \sim N\left( {0,\,\sigma^{2} } \right) $$

where \( Z_{i} \) represents a set of regressors predictive of responsiveness. As with the vignettes \( R_{i}^{s * } \) represents an unobserved latent variable and we assume that the observed categorical response, \( r_{i}^{s} \), relates to \( R_{i}^{s * } \) in the following way:

$$ r_{i}^{s} = j\quad {\text{if}}\;\mu_{i}^{j - 1} \le R_{i}^{s * } < \mu_{i}^{j} \quad {\text{for}}\quad\mu_{i}^{0} = - \infty ,\;\mu_{i}^{5} = \infty ,\;\forall \,i;\quad j = 1, \ldots ,5 $$

where \( \mu_{i}^{j} \) are defined by (3) with \( \gamma^{j} \) fixed and it is assumed that \( R_{ik}^{v * } \) and \( R_{i}^{s * } \) are independent for all \( i = 1, \ldots ,N \) and \( k = 1, \ldots ,V. \) Note that \( \hat{\sigma }^{2} \)in Eq. 4 is identified due to the thresholds being fixed through the reporting behaviour equation. It follows that the probabilities associated with each of the five categories are given by:

$$ \Pr \left( {r_{i} = j} \right) = \Upphi \left( {\mu_{i}^{j} - Z_{i} \beta } \right) - \Upphi \left( {\mu_{i}^{j - 1} - Z_{i} \beta } \right),\quad j = 1, \ldots ,5 $$

where \( \Upphi ( \cdot ) \) is the cumulative standard normal distribution.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rice, N., Robone, S. & Smith, P. Analysis of the validity of the vignette approach to correct for heterogeneity in reporting health system responsiveness. Eur J Health Econ 12, 141–162 (2011).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


JEL Classification