Background

A well-studied outcome in trauma care is health-related quality of life (HRQL) [1]. HRQL is used to estimate the impact of an injury on a patient’s life, and enables to evaluate quality of care in patients [2]. Measurement of change in HRQL, individual or aggregate, has been used to evaluate health interventions in a wide range of conditions and populations (e.g. [3,4,5,6,7]). Inaccurate measurement of change in HRQL may therefore affect clinical practice and health care, and ultimately the quality of care and HRQOL of patients. In an observational context this change provides insight into patterns of recovery over time [8, 9]. The understanding of recovery patterns supports the clinician in setting expectations, and the timely identification of specific patient groups with lower HRQL over time. Knowing who faces a poor prognosis may guide the development and application of targeted interventions to halt this development.

However, conventionally-measured change in HRQL may not always reflect the change in HRQL as perceived or experienced by the patient. Conventionally-measured change in HRQL is defined by the difference between the direct measurements of HRQL at two consecutive occasions. The patient’s perceived change in HRQL is defined as the difference between the directly measured current HRQL and the HRQL as stated by the patient to be the HRQL on a specified previous occasion. Indeed, McPhail showed that agreement between conventional change and retrospective change in HRQL was not strong. A large proportion of the disagreement was attributed to so-called recall bias [10]. Recall bias is defined as a systematic measurement error, due to memory decay, that is the fading of memory with time. From the current standpoint, past health may be memorized as more deteriorated or better than it actually was; the direction depending on psychological mechanisms which keep better or worse memories better alive [11]. The magnitude of recall bias may depend on the scale that is used to measure HRQL, where subjective scales, such as the visual analogue scale (VAS), may easier be distorted than classification-like scales, like the EQ-5D [12, 13]. Recall bias of HRQL has been observed among patients with e.g. multiple sclerosis, psoriasis, cancer, injury and total hip arthroplasty [14,15,16,17] A study among patients with traumatic brain showed that recall bias was stronger in patients with high symptoms, which include memory problems [18]. This indicates that the size of recall bias may be higher among patients who experience memory problems compared to their counterparts.

Response shift may also contribute to disagreement between conventional and retrospective change in HRQL. Response shift is a true change of a patients’ perspective towards the targeted construct, caused by a change in internal standards, a change in values, and/or a redefinition of the construct [19, 20]. This may change the direction of change. Among trauma patients response shift may occur between multiple post-injury HRQL measurements due to patients adapting to their ill health [19, 20]. However, the magnitude of response shift may vary across type and severity of injury. A study among multiple sclerosis patients showed that being more disabled was associated with a change in internal standards with regards to certain HRQL dimensions [21], indicating that response shift may be stronger among patients with more severe injuries. Consequently, the contribution of response shift to disagreement between conventional and retrospective change may also vary across subgroups of trauma patients.

McPhail et al. were the first to investigate response shift and recall bias simultaneously in a sample of 101 elderly hospitalized patients [10]. The investigators argued that the contribution of response shift and recall bias may vary across other patient groups. This may particularly be the case for trauma patients, since injuries comprise of heterogeneous patterns of ill-health and may affect patients of all age groups.

The aims of this study were to measure conventional and retrospective change of HRQL, measured with the EQ-5D-3 L and the EQ-VAS, and to assess to which extent recall bias and response shift contribute to disagreement, in a heterogeneous sample of trauma patients.

Hypotheses

We tested the following hypotheses:

  • Agreement between conventional and retrospective change of HRQL, as measured with EQ-VAS, is lower compared to the agreement if HRQL was measured with EQ-5D-3 L, because recall bias and response shift more easily distort subjective scales (like a VAS) than a classification-like scale like EQ-5D-3 L.

  • Agreement between conventional and retrospective change of HRQL is higher among trauma patients with less severe injuries (ISS < 16), because low impact trauma requires less adaptation to ones (final) health status compared severe trauma.

  • Recall bias rather than response shift causes disagreement between conventional and retrospective change because with the time lapse chosen (3 months) memory problems affecting recall are no longer trivial.

  • In older patients, in patients with traumatic brain injury (TBI) and patients with posttraumatic stress disorder (PTSD) the size of recall bias is higher compared to their counterparts, because these patients experience more memory problems.

Methods

Study design

This study utilizes data from a registry-based study on injury patients in Noord-Brabant (2.5 M inhabitants), the Netherlands. This prospective longitudinal cohort study, called the Brabant Injury Outcome Surveillance (BIOS) study, assessed outcomes in trauma patients, admitted to one of the ten hospitals in the Noord-Brabant region in the Netherlands [22]. The BIOS study includes multiple HRQL measurements up to 24 months after injury. Response shift items and recall questions were intentionally included in the 3 month follow-up survey. Ethical approval for the observational data analysis of this study was received from the Medical Ethics Committee Brabant (NL50258.028.14).

BIOS study population

All trauma patients (≥18 years), who attended the Emergency Department (ED) and were admitted to an Intensive Care unit (ICU) or ward of one of the ten hospitals between November 2015 and November 2016, and who were discharged alive, qualified for inclusion. Patients were excluded if they were unable to understand or answer Dutch language questionnaires, when they had a pathological fracture due to a primary malignancy, or when they had no permanent address [22].

The eligible patients were invited to participate in the BIOS study via a postal invitation one week after admission to hospital. This invitation was accompanied by an informed consent form and the first survey (T1). For this procedure ethical permission is obtained. Non-responders received a telephone call to discuss their participation. The 3 month follow up survey (T2) was sent to the patient if consent and the completed T1 survey were received by the researchers. For present study, we included data from patients who completed both surveys.

Self-report measures

T1 included questions on patient characteristics (e.g. age and gender), and 19 items regarding the presence of one or more chronic diseases (e.g. diabetes) prior to the injury to assess comorbidity [23]. If a patient suffered from one or more chronic disease(s) additional to the injury that qualified for inclusion, he/she was defined as having comorbidity [24]. Level of education was divided in three categories: low, middle or high. For patients classified as low level the highest level of education obtained was no education, primary school or prevocational education. Patients classified as middle level followed at best secondary or vocational education, and patients classified as high completed professional higher education or university level. Both surveys (T1, T2) included the EQ-5D-3 L.

The EQ-5D-3 L is a standardized generic HRQL measure [25]. The EQ-5D-3 L covers five dimensions: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression and a visual analogue scale (EQ-VAS) [25]. The five dimensions have three response options: no problems, moderate problems, extreme problems [26]. The ordinal scores on the dimensions can be used in a descriptive analysis, but may also be used as an input to calculate an EQ-5D-3 L summary score combining all dimensions, ranging from 0 (death) and 1 (full health) [27]. For few health states considered worse than death the summary score can have a value lower than zero. The EQ-VAS consists of a scale from 0 (worst imaginable health) to 100 (best imaginable health) and measures the patient’s self-rated health in a subjective way. Apart from the complete EQ-5D-3 L, the T2 questionnaire also included the so-called ‘recall’ and the ‘then’ test. The recall test asked patients to report what they remember to have reported on the EQ-5D-3 L on the previous occasion (T1). The then test asked patients to report how they believe now what their health status was at previous assessment (T1). Both the recall test and then test consisted of six items: five EQ-5D-3 L items and the EQ-VAS.

The T2 survey also included the impact of event scale (IES) [28]. The IES is a validated self-report instrument that uses 15-items questionnaire to assess stress symptoms caused by a traumatic event. Each item is scored on a 4-point scale (0, 1, 3, 5 points), where 0 refers to “not at all” and 5 refers to “extremely”. The total IES-score ranges from 0 (no meaningful impact into any direction) to 75 (severe impact event on all 15 items). PTSD is assumed to be present if IES-score exceeds 35 [29].

Injury data

Apart from the self-report data, clinical injury data of included trauma patients were available from the Brabant Trauma Registry. All BIOS hospitals also participate in this registry. Injury data comprised the Injury Severity Score (ISS) [30] and the Abbreviated Injury Scale (AIS) [31]. The AIS classifies the severity of a trauma via an anatomic scale and it scores the type, location and severity of each injury that was sustained by a patient. The AIS score of the three most severely injured body regions are squared and summed to an ISS. The ISS is an accepted summary score for the severity of a trauma, and ranges from 1 to 75. A major trauma is assumed to be present if the ISS exceeds 15 [32]. The ISS was automatically calculated based on the AIS scores that were registered in the Brabant Trauma Registry.

Data analysis

SPSS version 23 was used for all analyses. We performed a non-response analysis to study whether responders differed from non-responders. Mann Whitney U tests were used for continuous variables and Chi-square tests for categorical variables. Descriptive statistics were used to assess the sample characteristics, and EQ-5D-3 L dimension, EQ-5D-3 L summary scores and EQ-VAS scores.

T1 health outcomes were compared between subgroups: males vs. females, age < 65 vs. ≥65 years, absence vs. presence of pre-existing comorbidity, absence vs. presence of traumatic brain injury (TBI), ISS < 16 vs. ISS ≥ 16, and absence vs. presence of PTSD using Mann Whitney U tests. Similarly, Kruskal Wallis test was used to compare outcomes according to educational attainment (three levels).

The following equations were used to calculate conventional and retrospective change in HRQL:

$$ {\mathrm{Conventional}\ \mathrm{change}}_{\mathrm{EQ}-5\mathrm{D}\ \mathrm{summary}\ \mathrm{score}\ \mathrm{T}1,\mathrm{T}2}=\mathrm{EQ}-5\mathrm{D}-3\;\mathrm{L}\ {\mathrm{summary}\ \mathrm{score}}_{\mathrm{T}2}-\mathrm{EQ}-5\mathrm{D}-3\;\mathrm{L}\ {\mathrm{summary}\ \mathrm{score}}_{\mathrm{T}1} $$
(1)

Where EQ-5D-3 L summary scoreT2 and EQ-5D-3 L summary scoreT1 are the directly measured EQ-5D-3 L at T1 and T2, respectively.

$$ {\mathrm{Conventional}\ \mathrm{change}}_{\mathrm{EQ}-\mathrm{VAS}\ \mathrm{T}1,\mathrm{T}2}=\mathrm{EQ}-{\mathrm{VAS}}_{\mathrm{T}2}-\mathrm{EQ}-{\mathrm{VAS}}_{\mathrm{T}1} $$
(2)

Where EQ-VAST2 and EQ-VAST1 are the directly measured EQ-VAS scores at T1 and T2, respectively.

$$ {\mathrm{Retrospective}\ \mathrm{change}}_{\mathrm{EQ}-5\mathrm{D}\ \mathrm{summary}\ \mathrm{score}\ \mathrm{T}1,\mathrm{T}2}=\mathrm{EQ}-5\mathrm{D}-3\;\mathrm{L}\ {\mathrm{summary}\ \mathrm{score}}_{\mathrm{T}2}-\mathrm{EQ}-5\mathrm{D}-3\;\mathrm{L}\ \mathrm{then}\ \mathrm{test} $$
(3)

Where EQ-5D-3 L summary scoreT2 is the directly measured EQ-5D-3 L at T2 and EQ-5D-3 L then test is the EQ-5D-3 L summary score of how the respondents believed their EQ-5D-3 L health status was at previous assessment (T1).

$$ {\mathrm{Retrospective}\ \mathrm{change}}_{\mathrm{EQ}-\mathrm{VAS}\ \mathrm{T}1,\mathrm{T}2}=\mathrm{EQ}-{\mathrm{VAS}}_{\mathrm{T}2}-\mathrm{EQ}-\mathrm{VAS}\ \mathrm{then}\ \mathrm{test} $$
(4)

Where EQ-VAST2 is the directly measured EQ-VAS score at T2 and EQ-VAS then test is the EQ-VAS score of how the respondents believed their EQ-VAS score was at previous assessment (T1).

See Fig. 1 for a schematic overview of the calculations of conventional and retrospective change, recall bias and response shift.

Fig. 1
figure 1

Schematic overview of the calculations of conventional and retrospective change, recall bias and response shift

Wilcoxon signed rank tests were used to compare conventional and retrospective change in EQ-5D-3 L summary scores and EQ-VAS scores for the total sample as well as for subgroups (males vs. females, age < 65 vs. ≥65 years, absence vs. presence of pre-existing comorbidity, absence vs. presence of traumatic brain injury (TBI), ISS < 16 vs. ISS ≥ 16, and absence vs. presence of PTSD). It was examined whether conventional and retrospective change in EQ-5D-3 L summary scores and EQ-VAS scores differed for the different subgroups, and whether this was in accordance to our hypotheses. We calculated the intraclass correlation coefficient (ICC) to assess agreement between conventional and retrospective change on patient level [33]. We calculated the ICC for the whole group, and for subgroups (age, gender, educational level, comorbidity status, ISS category and PTSD symptoms). ICC was defined as poor (< 0.40), fair (0.40–0.59), good (0.60–0.74) or excellent (0.75–1.00) [34].

Recall bias and response shift were calculated with the following equations:

$$ {\mathrm{Recall}\ \mathrm{bias}}_{\mathrm{EQ}-5\mathrm{D}\ \mathrm{summary}\ \mathrm{score}}=\mathrm{EQ}-5\mathrm{D}-3\;\mathrm{L}\ \mathrm{recall}\ \mathrm{test}-\mathrm{EQ}-5\mathrm{D}-3\;\mathrm{L}\ {\mathrm{summary}\ \mathrm{score}}_{\mathrm{T}1} $$
(5)

Where EQ-5D-3 L recall test is the EQ-5D-3 L summary score of the EQ-5D-3 L health status that the respondents remember to have reported at the previous assessment (T1) and EQ-5D-3 L summary scoreT1 is the directly measured EQ-5D-3 L at T1.

$$ {\mathrm{Recall}\ \mathrm{bias}}_{\mathrm{EQ}-\mathrm{VAS}}=\mathrm{EQ}-\mathrm{VAS}\ \mathrm{recall}\ \mathrm{test}-\mathrm{EQ}-{\mathrm{VAS}}_{\mathrm{T}1} $$
(6)

Where EQ-VAS recall test is the EQ-VAS score that the respondents remember to have reported at the previous assessment (T1) and EQ-VAST1 is the directly measured EQ-VAS at T1.

$$ {\mathrm{Response}\ \mathrm{shift}}_{\mathrm{EQ}-5\mathrm{D}\ \mathrm{summary}\ \mathrm{score}}=\mathrm{EQ}-5\mathrm{D}-3\;\mathrm{L}\ \mathrm{then}\ \mathrm{test}-\mathrm{EQ}-5\mathrm{D}-3\;\mathrm{L}\ {\mathrm{summary}\ \mathrm{score}}_{\mathrm{T}1} $$
(7)

Where EQ-5D-3 L then test is the EQ-5D-3 L summary score of how the respondents believed their EQ-5D-3 L health status was at previous assessment (T1) and EQ-5D-3 L summary scoreT1 is the directly measured EQ-5D-3 L at T1.

$$ {\mathrm{Response}\ \mathrm{shift}}_{\mathrm{EQ}-\mathrm{VAS}}=\mathrm{EQ}-\mathrm{VAS}\ \mathrm{then}\ \mathrm{test}-\mathrm{EQ}-{\mathrm{VAS}}_{\mathrm{T}1} $$
(8)

Where EQ-VAS then test is the EQ-VAS score of how the respondents believed their health status was at previous assessment (T1) and EQ-VAST1 is the directly measured EQ-VAS at T1.

Wilcoxon signed rank tests were used to examine the differences between response shift and recall bias. Differenced were also studied on a subgroup level in order to examine whether recall bias and response shift differs between the subgroups defined. To estimate the role of background factors in recall bias and response shift respectively, we predicted recall bias and response shift from the socio-demographic factors (age, gender, education), TBI (yes or no), injury severity level (ISS as a continuous variable) and PTSD symptoms (IES-score as a continuous variable). Straightforward univariate and multivariate linear regression analysis were applied, with backward selection (deselection criterion p < 0.10) were used to investigate the association between socio-demographics, comorbidity, TBI, injury severity, PTSD and recall bias and response shift.

Overall p-values< 0.05 were considered to indicate statistical significance, although our analysis primarily was explorative.

Results

Study population

In total, 1518 of the 5731 invited patients participated in the BIOS study (26.5%). Responders were significantly younger than non-responders (p < 0.05) and significantly more often male than non-respondents (p < 0.05). In total, 790 patients responded on the T1 survey and 1351 patients responded on the T2 survey. However, only 550 of these patients completed the EQ-5D-3 L and EQ-VAS at T1 and T2, the then-test EQ-5D-3 L and EQ-VAS (at T2) and the recall EQ-5D-3 L and EQ-VAS (at T2) and were therefore included in this study. These 550 completers were significantly more often male, significantly younger, higher educated and had a shorter hospital stay compared to non-completer. The completers had a mean age of 61.0 years (SD 16.0) and slightly more than half of the participants (56.0%) was male (Table 1). Most participants had a middle or high-level education and comorbidity was highly prevalent (56.2%). Patients’ median hospital stay was 4.0 (IQR 2.0–6.0) days, and most common injuries were mild traumatic brain injury (28.9%) and hip fracture (20.7%). Median ISS was 5.0 (IQR 4.0–9.0).

Table 1 Characteristics of study population

EQ-5D-3 L – conventional change versus retrospective change

Table 2 shows the mean EQ-5D-3 L summary score at T1 and T2, mean Then Test, mean conventional change and retrospective change between EQ-5D-3 L summary scores at T1 and T2. Mean EQ-5D-3 L summary scores at T1 and T2 were 0.482 (SD 0.30) and 0.735 (SD 0.24), respectively. A lower EQ-5D-3 L summary score at T1 was associated with being younger, having an ISS ≥ 16, not having a TBI and having PTSD three months post-injury (all p < 0.05). Pairwise comparisons showed that agreement between retrospective and conventional change was fair (ICC = 0.49, p < 0.05) (see Table 2). Retrospective change was significantly higher compared to conventional change (Z = -5.2, p < 0.05). The difference between conventional and retrospective change was highest among patients with ISS ≥ 16 (mean difference = − 0.12, Z = -1.9, p = 0.058).

Table 2 Mean EQ-5D-3 L summary score at T1, conventional change and retrospective change between EQ-5D-3 L summary scores at T1 and T2 and magnitude of recall bias and response shift

EQ-5D-3 L – recall bias versus response shift

Recall bias and response shift are also shown in Table 2. Average recall bias ranged from − 0.09 (patients with PTSD) to − 0.004 (patients with TBI). Overall, recalled T1 EQ-5D-3 L was lower (− 0.02) than the directly assessed EQ-5D-3 L, except for males and patients with a high educational level (all p < 0.05). Pairwise comparisons showed that agreement between recall bias and response shift was good (ICC = 0.68, p < 0.05). Multivariate linear regression analysis indicated that increasing PTSD symptoms were associated with recalling T1 EQ-5D-3 L as lower (‘worse’) than directly assessed EQ-5D-3 L at T1 (see Table 3). The EQ-5D-3 L dimensions that differed most frequently between the directly assessed EQ-5D-3 L at T1 and the recall test were usual activities (36.5% of the respondents chose a different response option on the recall test), pain and/or other complaints (34.4%), self-care (28.7%) and anxiety/depression (27.5%).

Table 3 Multivariate models for recall bias on the EQ-5D-3 L summary score

Mean response shift ranged from − 0.12 (patients with an ISS ≥ 16) to − 0.04 (patients with a high educational level and patients older than 65 years). Multivariate linear regression analysis indicated that increasing symptoms of PTSD were significantly associated with an increase in response shift (see Table 4). The EQ-5D-3 L dimensions that differed most frequently between the directly assessed EQ-5D-3 L at T1 and the then test that was used to assess response shift were pain and/or other complaints (35.6% of the respondents chose a different pain and/or other complaints response option on the then test), daily activities (34.5%), self-care (31.3%) and anxiety/depression (27.3%). Response shift, with an average value of − 0.06, was significantly higher than recall bias (Z = − 4.5, p < 0.05).

Table 4 Multivariate models for response shift based on the EQ-5D-3 L summary score

EQ-VAS – conventional change versus retrospective change

Table 5 shows the mean EQ-VAS score at T1 and the mean conventional change and retrospective change between EQ-VAS scores at T1 and T2. Mean EQ-VAS score improved from T1 (56.3; SD 20) to T2 (72.6; SD 17). A lower EQ-VAS score at T1 was associated with female gender, younger age, having an ISS ≥ 16 and not having a TBI (all p < 0.05). Individual agreement between retrospective and conventional change in EQ-VAS was fair (ICC = 0.483, p < 0.05) (see Table 5). Retrospective change in EQ-VAS score was significantly higher compared to conventional change (Z = -2.1, p < 0.05). The difference between conventional and retrospective change in EQ-VAS was particularly large among patients with PTSD (difference = − 7.7, Z = -2.4, p < 0.05), patients with an ISS ≥ 16 (mean difference = − 6.6, Z = -1.7, p = 0.09) and patients with a TBI (mean difference = − 4.7, Z = -2.9, p < 0.05).

Table 5 Mean EQ-VAS score at T1, conventional change and retrospective change between EQ-VAS score at T1 and T2 and magnitude of recall bias and response shift

EQ-VAS – recall bias versus response shift

On average, the recalled T1 EQ-VAS was 0.6 lower (‘worse’) than directly assessed T1 EQ-VAS (p < 0.05). The mean recall bias ranged from − 7.3 for patients with PTSD to − 0.3 for patients with comorbidity. Overall, recalled T1 EQ-5D-3 L was lower than the directly assessed EQ-5D-3 L, except for patients aged 65 and older, patients without TBI and patients with a high educational level (all p < 0.05). Pairwise comparisons showed that agreement between recall bias and response shift was excellent (ICC = 0.78, p < 0.05). Multivariate linear regression analysis indicated that increasing PTSD symptoms and having TBI was significantly associated with a lower recalled T1 EQ-VAS compared to directly assessed EQ-VAS (see Table 6). With an average value of − 1.6, response shift was higher than recall bias, but this difference was not significant (Z = -0.635, p = 0.53). Response shift was highest for patients with PTSD (mean: − 7.6), patients with an ISS ≥ 16 (mean: − 6.7) and patients with TBI (mean: − 4.8) and lowest for patients without TBI (mean: − 0.2) and patients older than 65 years (mean: − 0.3). Multivariate linear regression analysis indicated that increasing PTSD symptoms and having TBI was associated with response shift (see Table 7).

Table 6 Multivariate models for response shift based on the EQ-VAS
Table 7 Multivariate models for recall bias based on the EQ-VAS

Discussion

Our study showed that retrospective change in HRQL exceeded conventional change and that, at the individual level, agreement between conventional and retrospective change was only fair for both the EQ-5D-3 L summary score and the EQ-VAS. Response shift, more than recall bias, modified the reported retrospective outcome.

The relative magnitude of recall bias and response shift was higher when measured with the EQ-VAS compared to the EQ-5D-3 L summary score. This may be due to the fact that the restricted range of responses of the EQ-5D-3 L may lead to smaller variability in scores compared to the continuous EQ-VAS [13]. The subjectivity of the scale, which is higher for VAS compared to the classification-like EQ-5D-3 L, played a much smaller role than expected. We expected that the individual agreement between conventional and retrospective change would be higher for the EQ-5D-3 L summary score compared to the EQ-VAS; however, our findings showed that the individual agreement was similar.

In agreement to our expectations, response shift was higher among trauma patients with severe injuries (ISS ≥ 16). This indicates that high impact trauma requires more adaptation to one’s health status compared to less severe trauma. Our findings also showed that, relatively shortly after sustaining injury, the magnitude of response shift is already quite large. This is in agreement with a study that assessed response shift among individuals with stroke and that found evidence of similarly large magnitude of response shift 24 weeks post-stroke [35].

Conversely to our expectations we did not find that the size of recall bias was higher in older respondents compared to their younger counterparts. This finding may be explained by participation bias. In our study, response rate was rather low and, possibly, the elderly that did participate may not have been representative of elder trauma patients in the sense that the elderly respondents may experience less memory problems compared to the elderly non-respondents.

We did find that response shift increased with increasing PTSD symptoms. This may indicate that symptoms of PTSD affect cognitive dissonance between the actual health state of the respondent and the desired health state. Second, both TBI and PTSD were positively associated with recall bias. Similarly to TBI, PTSD is associated with impairments in cognitive functioning [36,37,38]. Our findings clearly confirm that cognitive impairment is a non-trivial factor in research where recall bias may occur.

Agreement between conventional and retrospective change in HRQL was similar to the agreement reported by McPhail et al. and, similarly to McPhail et al., we found a higher retrospective change than conventional change for both the EQ-5D-3 L summary score and EQ-5D-3 L [10]. However, McPhail et al. reported a much larger difference between conventional and retrospective change and higher recall bias. This difference in findings may be explained by the difference in study population and/or timing of HRQL assessments. McPhail et al. studied hospitalized elderly, whereas we studied hospitalized trauma patients aged 18 and older. Higher age of the respondents may have contributed to the difference in conventional and retrospective change and magnitude of recall bias, although our study did not show important differences between conventional and retrospective change, recall bias or response shift in younger versus older trauma patients. With regards to the timing of HRQL assessment, McPhail et al. measured HRQL immediately after hospital admission (T1) and immediately after discharge (T2), whereas our first measurement of HRQL was 1 week post-injury. As a result, the change in HRQL at T1 and T2, as measured by McPhail et al., may have been much larger and subsequently also the contribution of recall bias and response shift to the difference between conventional and retrospective change in HRQL.

Strengths and limitations

Strengths of our study were the meticulous protocol and the regional coverage, with a high number of respondents. The combined use of subjective and a classification-like scales reinforced analytical opportunities as shown. The high number of respondents allowed us to test for differences between conventional and retrospective change in HRQL and assess recall bias and response shift for specific subgroups of trauma patients, such as trauma patients with severe injury and patients with TBI. The use of the EQ-5D-3 L and EQ-VAS allowed us to compare differences between conventional and retrospective change in HRQL and contribution of recall bias and response shift on a subjective scale and a classification like scale.

A limitation of our study was the uniform follow-up time (3 months), which limited conclusions on duration-dependence of the bias effects. In our study, the time between first and second measurement was 3 months. Stronger recall bias may be expected over longer periods, with perhaps recall becoming more important than response shift [11]. If e.g. 12 months instead of 3 months has been chosen as interval, the effect of recall bias might have been more pronounced.

A second limitation of our study was that the T2 survey included the EQ-5D-3 L and EQ-VAS for the direct measurement of current HRQL as well as the recall and then-test regarding the respondents’ HRQL at T1. This was a challenging task. This meant that the respondents had to fill out several similarly formulated questions. This may have affected both the number of respondents with complete responses, as well as the quality of the responses. Since we administered stand-alone paper-and-pencil surveys, we were not able to verify if the respondents understood the recall and then test and the difference between these two questionnaires.

A third limitation is the use of EQ-5D-3 L rather than the EQ-5D-5 L. As the EQ-5D-5 L has five response options instead of three with more sensitivity and precision, contrasts could have been larger in that case [39,40,41]. For future studies that aim to investigate recall bias and response shift we recommend to use the EQ-5D-5 L.

Implications for clinical practice

The findings of our study confirm that, in our sample of trauma patients, there was disagreement between conventional and retrospective change of HRQL and that recall bias and response shift both contributed to this difference. This is important to take into account when change in HRQL is used to evaluate health interventions in this patient group. Whether conventional or retrospective change should be used for the evaluation depends strongly on the aims of the intervention and the characteristics of the patients, as well as which perspective of change is the most important to various stakeholders, as was also pointed out by McPhail et al. [10].

Conclusions

We conclude that, compared to recall bias, response shift contributed more to the disagreement between conventional and retrospective change in EQ-5D-3 L summary score and EQ-VAS. Predictable subgroups of trauma patients were more susceptible to recall bias and response shift, such as patients who sustained TBI and patients with PTSD symptoms 3 months post-injury.