Skip to main content

Evaluation of the link between the Guttman errors and response shift at the individual level



Methods for response shift (RS) detection at the individual level could be of great interest when analyzing changes in patient-reported outcome data. Guttman errors (GEs), which measure discrepancies in respondents’ answers compared to the average sample responses, might be useful for detecting RS at the individual level between two time points, as RS may induce an increase in the number of discrepancies over time. This study aims to establish the link between recalibration RS and the change in the number of GEs over time (denoted index \(I\)) via simulations and explores the discriminating ability of this index.


We simulated the responses of individuals affected or not affected by recalibration RS (defined as changes in the patients’ standard of measurement) to determine whether simulated individuals with recalibration had a greater change in the number of GEs over time than individuals without recalibration. The effects of factors related to the sample, the questionnaire structure and recalibration were investigated. As an illustrative example, the change in the number of GEs was computed in patients suffering from eating disorders.


Within simulations, simulated individuals affected by recalibration had, on average, a greater change in the number of GEs over time than did individuals without RS. Some of the parameters related to the questionnaire structure and recalibration magnitude appeared to have substantial effects on the values of \(I\). Discriminating abilities appeared, however, globally low.


Some evidence of the link between recalibration and the change in GEs was found in this study. GEs could be a valuable nonparametric tool for RS detection at a more individual level, but further investigation is needed.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Data availability

Modules, scripts and an extract of the simulated data used in the paper are available at the Open Science Framework via the link:


  1. 1.

    i.e., nonrandom errors in the latent variable estimates.


  1. 1.

    Basch, E. (2017). Patient-reported outcomes: Harnessing patients’ voices to improve clinical care. The New England Journal of Medicine, 376(2), 105–108.

    Article  PubMed  Google Scholar 

  2. 2.

    Schwartz, C. E., Finkelstein, J. A., & Rapkin, B. D. (2017). Appraisal assessment in patient-reported outcome research: Methods for uncovering the personal context and meaning of quality of life. Quality of Life Research: An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, 26(3), 545–554.

    Article  Google Scholar 

  3. 3.

    Sprangers, M. A. G., & Schwartz, C. E. (1999). Integrating response shift into health-related quality of life research: A theoretical model. Social Science & Medicine, 48(11), 1507–1515.

    Article  CAS  Google Scholar 

  4. 4.

    Vanier, A., Falissard, B., Sébille, V., & Hardouin, J. B. (2017). The complexity of interpreting changes observed over time in health-related quality of life: A short overview of 15 years of research on response shift theory. In F. Guillemin, A. Leplege, S. Briancon, E. Spitz, & J. Coste (Eds.), Perceived health and adaptation in chronic disease (1st ed.). New York: Routledge.

    Google Scholar 

  5. 5.

    Schwartz, C. E., Sprangers, M. A., & Fayers, P. M. (2005). Response shift: You know it’s there, but how do you capture it? Challenges for the next phase of research. In Assessing quality of life in clinical trials (2nd ed.). Oxford University Press.

    Google Scholar 

  6. 6.

    Oort, F. J. (2005). Using structural equation modeling to detect response shifts and true change. Quality of Life Research: An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, 14(3), 587–598.

    Article  Google Scholar 

  7. 7.

    Schwartz, C. E. (2016). Introduction to special section on response shift at the item level. Quality of Life Research, 25(6), 1323–1325.

    Article  PubMed  Google Scholar 

  8. 8.

    Guilleux, A., Blanchin, M., Vanier, A., Guillemin, F., Falissard, B., Schwartz, C. E., Hardouin, J. B., & Sébille, V. (2015). RespOnse Shift ALgorithm in Item response theory (ROSALI) for response shift detection with missing data in longitudinal patient-reported outcome studies. Quality of Life Research, 24(3), 553–564.

    Article  PubMed  Google Scholar 

  9. 9.

    Blanchin, M., Guilleux, A., Hardouin, J.-B., & Sébille, V. (2020). Comparison of structural equation modelling, item response theory and Rasch measurement theory-based methods for response shift detection at item level: A simulation study. Statistical Methods in Medical Research, 29(4), 1015–1029.

    Article  PubMed  Google Scholar 

  10. 10.

    Vanier, A., Sébille, V., Blanchin, M., Guilleux, A., & Hardouin, J.-B. (2015). Overall performance of Oort’s procedure for response shift detection at item level: A pilot simulation study. Quality of Life Research: An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, 24(8), 1799–1807.

    Article  Google Scholar 

  11. 11.

    Nolte, S., Mierke, A., Fischer, H. F., & Rose, M. (2016). On the validity of measuring change over time in routine clinical assessment: A close examination of item-level response shifts in psychosomatic inpatients. Quality of Life Research: An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, 25(6), 1339–1347.

    Article  CAS  Google Scholar 

  12. 12.

    Gandhi, P. K., Schwartz, C. E., Reeve, B. B., DeWalt, D. A., Gross, H. E., & Huang, I.-C. (2016). An item-level response shift study on the change of health state with the rating of asthma-specific quality of life: A report from the PROMIS(®) Pediatric Asthma Study. Quality of Life Research: An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, 25(6), 1349–1359.

    Article  Google Scholar 

  13. 13.

    Verdam, M. G. E., Oort, F. J., & Sprangers, M. A. G. (2016). Using structural equation modeling to detect response shifts and true change in discrete variables: An application to the items of the SF-36. Quality of Life Research: An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, 25(6), 1361–1383.

    Article  Google Scholar 

  14. 14.

    Ahmed, S., Sawatzky, R., Levesque, J.-F., Ehrmann-Feldman, D., & Schwartz, C. E. (2014). Minimal evidence of response shift in the absence of a catalyst. Quality of Life Research: An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation, 23(9), 2421–2430.

    Article  Google Scholar 

  15. 15.

    Blanchin, M., Sébille, V., Guilleux, A., & Hardouin, J.-B. (2016). The Guttman errors as a tool for response shift detection at subgroup and item levels. Quality of Life Research, 25(6), 1385–1393.

    Article  PubMed  Google Scholar 

  16. 16.

    Meijer, R. R., Niessen, A. S. M., & Tendeiro, J. N. (2016). A practical guide to check the consistency of item response patterns in clinical research through person-fit statistics: Examples and a computer program. Assessment, 23(1), 52–62.

    Article  PubMed  Google Scholar 

  17. 17.

    Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory. SAGE.

    Book  Google Scholar 

  18. 18.

    Emons, W. H. M. (2008). Nonparametric person-fit analysis of polytomous item scores. Applied Psychological Measurement, 32(3), 224–247.

    Article  Google Scholar 

  19. 19.

    Fischer, G. H., & Ponocny, I. (1994). An extension of the partial credit model with an application to the measurement of change. Psychometrika, 59(2), 177–192.

    Article  Google Scholar 

  20. 20.

    American Psychiatric Association, & American Psychiatric Association (eds.). (2009). Diagnostic and statistical manual of mental disorders: DSM-IV-TR (4. ed., text revision, 13. print.). Arlington, VA: American Psychiatric Assoc.

  21. 21.

    Lecrubier, Y., Sheehan, D., Weiller, E., Amorim, P., Bonora, I., Harnett Sheehan, K., & Dunbar, G. (1997). The Mini International Neuropsychiatric Interview (MINI). A short diagnostic structured interview: Reliability and validity according to the CIDI. European Psychiatry, 12(5), 224–231.

    Article  Google Scholar 

  22. 22.

    Sheehan, D. V., Lecrubier, Y., Sheehan, K. H., Amorim, P., Janavs, J., Weiller, E., & Dunbar, G. C. (1998). The Mini-International Neuropsychiatric Interview (M.I.N.I.): The development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10. The Journal of Clinical Psychiatry, 59(Suppl 20), 22–33.

    PubMed  Google Scholar 

  23. 23.

    Garner, D. M. (1991). Eating disorder inventory-2. Professional manual. Psychological Assessment Research.

    Google Scholar 

  24. 24.

    Archinard, M., Rouget, P., Painot, D., & Liengme, C. (2002). Inventaire des troubles alimentaires 2 [Eating Disorder Inventory 2]. In M. Bouvard & J. Cottraux (Eds.), Protocoles et échelles d’évaluation en psychiatrie et en psychologie [Protocols and evaluation scales in psychiatry and psychology] (3rd ed., pp. 249–251). Masson.

    Google Scholar 

  25. 25.

    Cloninger, C. R., Przybeck, T. R., & Svrakic, D. M. (1994). The temperament and character inventory (TCI) a guide to its development and use. Center for Psychobiology of Personality, Washington University.

    Google Scholar 

  26. 26.

    Pélissolo, A., & Lépine, J.-P. (1997). Traduction française et premières études de validation du questionnaire de personnalité TCI. [Validation study of the French version of the TCI.]. Annales Médico-Psychologiques, 155(8), 497–508.

    Google Scholar 

  27. 27.

    Chakroun-Vinciguerra, N., Faytout, M., Pélissolo, A., & Swendsen, J. (2005). Validation française de la version courte de l’Inventaire du Tempérament et du Caractère (TCI-125). Journal de Thérapie Comportementale et Cognitive, 15(1), 27–33.

    Article  Google Scholar 

  28. 28.

    Cooper, P. J., Taylor, M. J., Cooper, Z., & Fairbum, C. G. (1987). The development and validation of the body shape questionnaire. International Journal of Eating Disorders, 6(4), 485–494.;2-O

    Article  Google Scholar 

  29. 29.

    Rousseau, A., Knotter, A., Barbe, P., Raich, R., & Chabrol, H. (2005). Validation of the French version of the Body Shape Questionnaire. L’Encephale, 31(2), 162–173.

    Article  PubMed  CAS  Google Scholar 

  30. 30.

    Ware, J. E., & Sherbourne, C. D. (1992). The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Medical Care, 30(6), 473–483.

    Article  Google Scholar 

  31. 31.

    Aaronson, N. K., Ahmedzai, S., Bergman, B., Bullinger, M., Cull, A., Duez, N. J., & de Haes, J. C. (1993). The European Organization for Research and Treatment of Cancer QLQ-C30: A quality-of-life instrument for use in international clinical trials in oncology. Journal of the National Cancer Institute, 85(5), 365–376.

    Article  PubMed  CAS  Google Scholar 

  32. 32.

    Holland, P. W., & Wainer, H. (Eds.). (1993). Differential item functioning. Differential Item Functioning, xv, 453–xv, 453.

  33. 33.

    Osterlind, S., & Everson, H. (2009). Differential item functioning. SAGE Publications.

    Book  Google Scholar 

  34. 34.

    Marais, I., & Andrich, D. (2008). Formalizing dimension and response violations of local independence in the unidimensional Rasch model. Journal of Applied Measurement, 9(3), 200–215.

    PubMed  Google Scholar 

  35. 35.

    Christensen, K. B., Kreiner, S., & Mesbah, M. (Eds.). (2013). Rasch models in health. ISTE.

    Google Scholar 

  36. 36.

    Andrich, D., & Kreiner, S. (2010). Quantifying response dependence between two dichotomous items using the rasch model. Applied Psychological Measurement, 34(3), 181–192.

    Article  Google Scholar 

  37. 37.

    Andrich, D., Humphry, S. M., & Marais, I. (2012). Quantifying local, response dependence between two polytomous items using the Rasch model. Applied Psychological Measurement, 36(4), 309–324.

    Article  Google Scholar 

  38. 38.

    Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8(2), 125–145.

    Article  Google Scholar 

  39. 39.

    Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265.

    Article  Google Scholar 

  40. 40.

    Hoskens, M., & De Boeck, P. (1997). A parametric model for local dependence among test items. Psychological Methods, 2(3), 261–277.

    Article  Google Scholar 

  41. 41.

    Douglas, J., Kim, H. R., Habing, B., & Gao, F. (1998). Investigating local dependence with conditional covariance functions. Journal of Educational and Behavioral Statistics, 23(2), 129–151.

    Article  Google Scholar 

  42. 42.

    Ip, E. H. (2001). Testing for local dependency in dichotomous and polytomous item response models. Psychometrika, 66(1), 109–132.

    Article  Google Scholar 

  43. 43.

    Ip, E. H. (2002). Locally dependent latent trait model and the dutch identity revisited. Psychometrika, 67(3), 367–386.

    Article  Google Scholar 

  44. 44.

    Edwards, M. C., Houts, C. R., & Cai, L. (2018). A diagnostic procedure to detect departures from local independence in item response theory models. Psychological Methods, 23(1), 138–149.

    Article  PubMed  Google Scholar 

  45. 45.

    Straat, J. H., van der Ark, L. A., & Sijtsma, K. (2016). Using conditional association to identify locally independent item sets. Methodology, 12(4), 117–123.

    Article  Google Scholar 

  46. 46.

    Olsbjerg, M., & Christensen, K. B. (2015). Modeling local dependence in longitudinal IRT models. Behavior Research Methods, 47(4), 1413–1424.

    Article  PubMed  Google Scholar 

  47. 47.

    Marais, I. (2009). Response dependence and the measurement of change. Journal of Applied Measurement, 10, 17–29.

    PubMed  Google Scholar 

  48. 48.

    Olsbjerg, M., & Christensen, K. B. (n.d.). LIRT: SAS macros for longitudinal IRT models, 49.

  49. 49.

    Olsbjerg, M., & Christensen, K. B. (2015). %lrasch_mml : A SAS macro for marginal maximum likelihood estimation in longitudinal polytomous rasch models. Journal of Statistical Software.

    Article  Google Scholar 

Download references


We would like to warmly thank all the staff members of the EVALADD cohort.


Y. Dubuy received a national grant from the French Ministry of Higher Education, Research and Innovation. The EVALADD cohort is sponsored by Nantes University Hospital (CHU Nantes).

Author information



Corresponding author

Correspondence to Yseulys Dubuy.

Ethics declarations

Conflict of interest

Authors declare that they have no conflict of interest.

Ethical approval

The EVALADD cohort (Investigator: M. Grall-Bronnec) was approved by the local Research Ethics Committee (Groupe Nantais d’Ethique dans le Domaine de la Santé), by the CCTIRS (Comité Consultatif sur le Traitement de l'Information en matière de Recherche dans le domaine de la Santé) and by the CNIL (Commission Nationale de l'Informatique et des Libertés).

Informed consent

All participants provided written informed consent (for under 18-year-olds, a legal representative provided informed consent), in accordance with the Helsinki declaration.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 4778 kb)

Supplementary file2 (PDF 317 kb)

Appendix 1: simulation implementation

Appendix 1: simulation implementation

Longitudinal partial credit model

The longitudinal Partial Credit Model (LPCM) was chosen to generate data since it allowed modelling response categories probabilities of polytomous items forming a unidimensional scale across time, and provided a possibility to simulate RS for a changing proportion of patients. The probability of patient \(n\) to answer \(m~(=0,\dots , M-1)\) on item \(j\) at time \(t\) under the LPCM is given by:

$$P\left({X}_{nj}^{(t)}=m|{\theta }_{n}^{(t)}, {\delta }_{j1}^{(t)},\dots , {\delta }_{jM-1}^{\left(t\right)}\right)=\frac{\mathrm{exp}\left(m{. \theta }_{n}^{\left(t\right)}-\sum_{p=1}^{m}{\delta }_{jp}^{\left(t\right)}\right)}{\sum_{l=0}^{M-1}\mathrm{exp}\left(l.{\theta }_{n}^{\left(t\right)}-\sum_{p=1}^{l}{\delta }_{jp}^{\left(t\right)}\right)}$$

where \({X}_{nj}^{(t)}\) denotes the response to the item \(j=1,\dots , J\) of the individual \(n\) at time \(t\)

\({\theta }_{n}^{\left(t\right)}\) stands for the latent variable level of the individual \(n\) at \(t\) (realization of the random variable \(\Theta\)).

$$\left(\begin{array}{c}{\Theta }^{{(t}_{1})}\\ {\Theta }^{{(t}_{2})}\end{array}\right)\sim N\left(\left[\begin{array}{c}{\mu }_{1}\\ {\mu }_{2}\end{array}\right],\Sigma =\left[\begin{array}{cc}{\sigma }_{1}^{2}& {\sigma }_{\mathrm{1,2}}\\ {\sigma }_{\mathrm{1,2}}& {\sigma }_{2}^{2}\end{array}\right]\right)$$

\({\delta }_{jm}^{\left(t\right)}\) is the difficulty of the response category \(m=1,\dots , M-1\) from item \(j\) at the time point \(t\). If \({\delta }_{jm}^{\left(t\right)}\) is low, the proportion of patients scoring \(m\) or more to item \(j\) will be high: \(m\) is hence an easy response category (vice versa for difficult response categories). Null response categories do not have a difficulty parameter.

At the first measurement occasion, difficulty parameters were chosen to be spaced along the latent variable continuum (assumed normally distributed, with a zero mean and a standard deviation equaled to 1). For each item\(j\), the difficulty parameter of the first positive response category (denoted \({\delta }_{j1}^{(t_1)}\)) equaled the \(\frac{j}{J+1}th\) quantile from a \(N\left(\mathrm{0,1}\right)\). Difficulty parameters of the following response categories were then regularly shifted from the first one: \({\delta }_{jm}^{\left({t}_{1}\right)}={\delta }_{j1}^{({t}_{1})}+\left(m-1\right)\times \frac{2}{M-2}\). Finally, difficulty parameters of all items were centered on the mean \(\overline{\updelta }=\frac{\sum_{j,m}{\delta }_{jm}^{\left({t}_{1}\right)}}{J(M-1)}\) so that difficulty parameters were centered on the mean of the latent variable distribution (i.e. 0). It hence corresponded to the situation where the questionnaire is suitable for a population with a latent variable following a standard normal distribution. At the first measurement occasion, the model is a rating scale model.

Recalibration operationalization

To simulate the responses of patients affected by UR at \({t}_{2}\), we choose to shift by − 1 all the difficulty parameters of the item(s) affected by recalibration, making all response categories easier. For patients affected by NUR, difficulty parameters were differentially shifted by values ranging 0 to 2 \(2\eta ,\mathrm{ with~}\eta =1.8\): the first positive response category kept the same difficulty parameter over time, while other categories became more difficult. Finally, we kept the difficulty parameters constant over time to simulate the responses of patients not affected by RS.

$$\text{For all } m \,\mathrm{in }\left\{1,\dots ,M-1\right\},~ \delta_{jm}^{(t_2)} = \left\{ {\begin{array}{*{20}cl} \delta_{jm}^{(t_1)}+\eta_m & \text{for individuals affected by RS} \\ \delta_{jm}^{(t_1)} & \quad \,\text{for individuals not affected by RS} \end{array} } \right. $$

For UR, \({\eta }_{m}^{UR}=-1\) for all \(m\) \(\mathrm{in }\left\{1,\dots ,M-1\right\}\)

For NUR, \(\eta_{m}^{NUR}=\left\{\begin{array}{cll}\frac{(m-1) \eta}{m} & \quad\text { if } ~1 \leq m<\frac{M}{2}& \\ \eta & \quad\text { if } ~m=\frac{M}{2} & \text { where } \eta=1.8 \\ \frac{(M-m+1) \eta}{M-m} & \quad\text { if } ~\frac{M}{2}<m \leq M-1 & \end{array}\right.\)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Dubuy, Y., Sébille, V., Grall-Bronnec, M. et al. Evaluation of the link between the Guttman errors and response shift at the individual level. Qual Life Res (2021).

Download citation


  • Response shift
  • Guttman errors
  • Recalibration
  • Individual level