Background

Worldwide, the global burden of disability continues to increase as a consequence of population growth, reductions in mortality due to improvements in healthcare, and the ageing of populations [1]. This presents a significant challenge for health systems which face growing demand for services designed to reduce the impact of disability on quality of life [2]. Injury has been identified as a key contributor to the global disability burden, particularly in high and middle-income countries [1]. Despite a notable decline in deaths from injury over time, non-fatal injuries remain a leading cause of hospitalisation [3]. The age-adjusted annualised rate of injuries requiring some form of medical treatment was approximately 126 per 1000 members of the United States (US) population in 2014 [4]. Current information regarding the impact of injury on subsequent disability is essential to plan for the effective allocation of available resources within health systems in order to promote optimum recovery from injury. This information can also be disseminated to patients to ensure they have accurate expectations for their recovery, and may be useful in the development of targeted interventions designed to minimise disability after injury.

While some information is available on the incidence of both fatal and nonfatal injuries, these data do not adequately depict the long-term consequences for injured individuals [3]. As a result, measures of health-related quality of life (HRQL), often assessing functional status (an important component of disability) [5] are increasingly utilised to quantify the effect of injury on population health [6]. HRQL measures, including generic and disease-specific measures, aim to provide a comprehensive estimation of health, and are often self-reported [7]. When examining outcomes following injury it is useful to use generic HRQL measures as these enable comparison of outcomes and recovery patterns within and between different injury populations [8]. Such measures also allow for comparisons between injured individuals and members of the general population, and with people with other health conditions [9]. This information can be used to inform approaches to rehabilitation and effective community reintegration.

Most generic HRQL measures are comprised of items that aim to measure health in relation to a broad range of dimensions, such as physical health, psychological health, mobility, social relationships, and environmental health [10]. There are different approaches to the reporting of findings obtained using these measures. Some studies report the proportion of individuals experiencing difficulties with respect to particular HRQL dimensions, while others report summary scores for each dimension (e.g. means and standard deviations/confidence intervals), and/or a global HRQL score based on the sum of all items within the measure. Some measures derive utility scores (weights) which are often determined by asking members of the general population to provide their ‘preferences’ for certain health states. Utility scores are commonly used in economic evaluations, incorporating the impact of injury on both quantity and quality of life [11]. Although there are various approaches to reporting findings from measures of HRQL, each approach can be used to understand patterns of HRQL over time for people with a broad range of injuries, highlighting potential pathways to recovery.

An earlier systematic review was conducted to examine studies that had measured HRQL using a generic instrument among general injury populations, in order to summarise existing knowledge in this area [12]. The review included studies conducted during 1995–2009 and found a lack of consensus on preferred HRQL instruments and study designs for the measurement of injury-related outcomes [12]. A total of 24 different generic HRQL and functional status measures were identified in the 41 studies meeting inclusion criteria. The most frequently used measures included the Medical Outcome Study Short Form-36 items (SF-36), the Functional Independence Measure (FIM), the Glasgow Outcome Scale (GOS), and the EQ-5D-3 L. These measures were found to be administered at a range of different times points post-injury, with follow-up most commonly occurring at 6, 12 and 24 months. Twelve studies reported HRQL utility scores. Overall, studies found that while significant recovery occurred in the first year post-injury, deficits from full recovery continued up to 2 years post-injury (when compared with population norms or pre-injury health status) [12]. This was observed among populations with a broad range of injury severities, as well as severely injured populations.

Given the increasingly recognised importance of documenting the HRQL outcomes experienced by specific subpopulations, including individuals with injury [13], it is expected that many additional studies will have used generic health state measures among general injury populations since 2009 [14, 15]. However, it is unclear exactly how many studies have been conducted, how studies reported HRQL findings, and whether there has been greater consistency in study designs (including use of HRQL instruments, study populations, and assessment time points). It is possible that greater consistency in study designs may have been facilitated by the publication of the European Consumer Safety Association guidelines for undertaking follow-up studies measuring injury-related disability in 2007 [16]. These guidelines recommend the use of both the EQ-5D and Health Utilities Mark III (HUI) in all studies examining injury-related disability, with assessments at 1, 2, 4 and 12 months post-injury in addition to a pre-injury assessment. The earlier systematic review concluded that the guidelines were not being followed; yet this may have been because included studies had already finalised their protocol and/or data collection prior to the publication of the guidelines.

In order to gain contemporary information on injury outcomes and to investigate whether there has been an increase in the consistency of study designs since 2009 we conducted an updated systematic review of studies measuring HRQL with a generic instrument in general injury populations. Increased consistency in study designs would allow for improved comparisons between studies and increased precision in estimates of the burden of injury over time. As in the earlier review, we aimed to identify: i) which generic HRQL measures were used; ii) what methods were used to administer the measures; iii) the time points at which HRQL was measured; iv) how HRQL findings were reported; and v) whether changes over time, and predictors of, HRQL were assessed. We also explored whether studies eligible for inclusion used HRQL measures with properties that meet widely accepted recommendations in the field (with respect to internal consistency, reliability, measurement error, content validity, construct validity, criterion validity, responsiveness, and interpretability) [17]. Studies using appropriate measures and consistent designs are essential to ensure that accurate information on the burden of injury is available, allowing for the effective targeting of resources to maintain HRQL after injury.

Methods

Data sources and strategy

A new search of empirical studies on the HRQL of general injury populations was conducted. The search strategy that was developed for the systematic review of Polinder et al. [12] was updated in collaboration with a librarian specialising in literature searches. In order to match the database specific indexing terms, the search strategy was adjusted for the different electronic databases: Embase, PubMed (Medline Ovid), Web of Science and PsycINFO. The terms used in the search strategy were: ‘quality of life’ and ‘health related quality of life’, ‘functional status assessment’, ‘injury’ and ‘trauma’, and ‘cohort analysis’ (complete search strategy in Appendix 1). Articles were included in the search if the period of publication was between 2010 and 2018, and if they were peer-reviewed. The reference lists of the included articles were also screened, in order to detect additional articles that were relevant, and to identify important key terms. Details of the systematic review process were successfully registered and published within the PROSPERO database (registration number CRD42019120207).

Selection criteria

To be included in this review, studies had to use a generic HRQL or disability measure at more than one time point in a population of injury/trauma patients. While HRQL and disability are unique constructs, the World Health Organization International Classification of Functioning, Disability and Health (ICF) acknowledges the relationship between disability and HRQL, particularly with respect to participation in activities of daily living [5]. For the purpose of this review, the World Health Organization (WHO) definition of disability is used. The WHO defines disability as an umbrella term reflecting impairments, activity limitations, and participation restrictions [18]. The concept of HRQL is more specific, reflecting an individual’s or population’s perceptions of health (mental and physical) and functional status [19]. Several measures of disability, such as the World Health Organization Disability Assessment Schedule (WHODAS) based on the ICF, can be used to evaluate not only disability but also HRQL [20].

Additional inclusion criteria were publication in English and in a peer-reviewed journal between 2010 and 2018. Studies that focused on only one specific injury population, such as traumatic brain injury patients, were excluded as only studies with a general injury population were the focus of this review. Furthermore, studies measuring HRQL in people other than individuals with injury were excluded, as were studies employing non-generic HRQL instruments, and review and pilot studies. There was no restriction on age or injury severity. Therefore, studies focusing on a specific age group or specific injury severity, but not focusing on a specific injury, were included.

Data extraction and quality assessment

After completion of the database searches, relevant articles were selected in three steps. First, the titles of the articles were screened, next, the abstracts of the articles selected in step one were screened, and finally, the entire articles selected in step two were read. By screening the titles, abstracts and articles, it was determined whether an article should be included or not according to the selection criteria. The screening procedure was conducted by two researchers independently (AG and AR). In cases of disagreement between the two researchers, a third researcher (JH) was consulted. This researcher also checked a sample of abstracts (n = 50) in order to quality assure the process. The full articles that were eligible for inclusion were then analysed by two reviewers (AG and AR), using a modified version of the data extraction form developed for the original review by Polinder et al. [12]

The methodological quality of each study was independently assessed by two researchers (AG and AR) using three items from the RTI Item Bank on Risk of Bias and Precision of Observational Studies [21]. This item bank consists of 29 items designed to evaluate the quality of observational studies of interventions or exposures. It is recommended to select items that can evaluate the most critical threats to validity associated with the studies under investigation. For this review, items 16, 17, and 18 were selected for use; each of these items address potential bias associated with follow-up assessments in longitudinal studies. In addition, alignment of studies with the Guidelines for the Conduction of Follow-up Studies Measuring Injury-Related Disability was analyzed [16].

The results of all studies were tabulated in order to identify the different measures used, the methods of reporting HRQL information (e.g. summary scores), and whether any changes in HRQL over time were observed. For studies presenting HRQL summary scores, the scores could range from either 0 to 1 or 0 to 100 depending on the measurement instrument used. Two examples of generic HRQL instruments that can be used to derive a summary score are the EQ-5D and the SF-36. With respect to disability, an example of an instrument that can be used to derive a summary score is the WHODAS II [22]. For all instruments examined, lower scores were representative of worse health.

Results

Literature search

The search strategy in the specified databases provided a total of 8152 unique potentially relevant articles (see Fig. 1). One additional article that did not turn up in our search was extracted from the reference list of an included study, and added to the relevant titles. In the first selection round, based on scanning the titles, 7386 articles were excluded. The main reasons for exclusion were that studies were not about injury or were about a specific injury type, rather than injury in general. The abstracts of the remaining 766 articles were read in the next selection round, resulting in the exclusion of 668 more articles due to a lack of HRQL measurement. The full texts of the remaining 98 articles were read, and led to the final inclusion of 44 articles. These articles represent 29 unique studies. The main reason for final exclusion of 54 articles was a lack of a sufficient HRQL measurement or the lack of multiple HRQL measurements.

Fig. 1
figure 1

PRISMA Flow Diagram

Study characteristics

Study characteristics are presented in Table 1. Out of the 44 articles that were included in our systematic review, most (n = 12) reported findings from a single prospective cohort study conducted in New Zealand [14, 27,28,29,30,31,32,33,34,35,36,37]. Seven articles were published using data from Australia [24,25,26, 39,40,41,42], with two articles related to the same study cohort from Victoria [41, 42] and two articles related to the same cohort from South-East Queensland [25, 26]. Five articles reported on five unique studies conducted in the United States [38, 53, 57, 64, 65]. Three articles resulted from two studies in Switzerland [43,44,45] and three articles resulted from two studies in Norway [59, 60, 62], respectively. Two articles from two different studies were detected from both Italy [46, 47] and Sweden [52, 56]. Remaining articles were from studies conducted in Hong Kong [55], India [48], British Colombia [58], Iran [23], Spain [50], United Kingdom [49], Thailand [63], Japan [61] and Vietnam [51] (all n = 1). One study was a multicentre study, conducted in both Australia and Hong Kong [54]. The sample sizes for each investigation ranged from 105 to 87,134, with the majority of the samples in the range of 105 to 668 participants (n = 28). Four studies measured HRQL in children and adolescents [48, 53, 57, 58], while all other studies focussed on adult populations. All studies included a non-specific injury population, with differing injury severities.

Table 1 Study characteristics of included articles measuring HRQL in general injury populations

Approximately a third (n = 10) of all studies focused on all injury severities, with a main inclusion criteria of hospital admission or injuries likely to result in insurance claims for more than just medical treatment. The second largest group of studies focussed on major injuries (n = 18). Inclusion criteria were varying, with some studies only requiring ≥24 h stay at the hospital or admission to intensive care unit (ICU) (n = 7), and other studies requiring a minimum score on the ISS (Injury Severity Score) or NISS (New Injury Severity Score). ISS for major injuries ranged from ISS > 12 (n = 2) to ISS ≥16 (n = 2), versus NISS ranging from NISS ≥8 (n = 1) to NISS ≥16 (n = 2). The remaining 5 studies focused on moderate (n = 3) or mild to moderate (n = 2) injuries, with moderate injury studies requiring AIS (Abbreviated Injury Scale) ≥2 (n = 1) or ISS ≥9 (n = 2), and mild to moderate injury studies requiring ISS < 15 (n = 1) and length of hospitalisation < 24 h (n = 1).

Study design

All studies that were included in this review were prospective cohort studies. Seven out of the 29 unique studies were multicentre studies [24, 48, 49, 52, 54,55,56]. Across studies HRQL and disability were measured with 14 different measurement instruments. Generic instruments SF-36 (n = 13) and EQ-5D (n = 7) were most commonly used, followed by SF-12 (n = 6) and GOSE (n = 4), as can be retrieved from Fig. 2. Approximately 45% of the studies (n = 13) used more than one measurement instrument, of which 10 used two instruments, and 3 used more than two instruments. All measurement instruments were generic, with three out of four studies in children using a child-specific instrument (PedsQL; PedsQL 4.0; PedsQL infant scales) only, and one study in children using two all ages instruments (SF-12 and SF-36). Measurement of HRQL was conducted at different time points in studies, with the number of follow-up points varying from one (n = 4) to five (n = 3). HRQL was assessed at more than one follow-up point in 25 studies, with measurement at 6 and 12 months most frequent across all studies (n = 14 and n = 19, respectively) (Fig. 3). Three other common measurement points were 24 months (n = 12), 1 month (n = 9) and 3 months (n = 7) after injury. Studies used different administration methods of questionnaires, with telephone interview as the most common method (n = 13). A combination of different methods was common, with baseline measurement often performed in a face-to-face interview, and later follow-up measurements done by either telephone or postal/email interview.

Fig. 2
figure 2

Frequency of generic measures used in studies to assess HRQL. Note1: Some studies used more than 1 measurement instrument. Note2: ‘Other’ consists of: GOS (2), HUI3 (1), MOS-SF-8 (1), MFA (1), TOP (1), FIM (1), PedsQL 4.0 Generic core (1), PedsQL infant scales (1)

Fig. 3
figure 3

Frequency of time points at which HRQL was measured across studies

Quality of studies

Length of follow-up was consistent for all study participants in all but two studies [25, 26, 56]. The same results were found regarding whether follow-up time was sufficient for measuring primary outcomes, with only two studies reporting an insufficient follow-up period [24, 47]. However, attrition appeared to be a problem in many studies: 18 out of 29 studies exceeded the attrition norm of 20% for < 1 year follow-up and 30% for ≥1 year follow-up.

Regarding adherence to the Guidelines for the Conduction of Follow-up Studies Measuring Injury-Related Disability, it was found that study populations were generally in accordance with the guidelines. However, measurement in respondents with mental and/or social problems was only specifically mentioned in two studies [40, 48], whereas all other studies provided no or unclear information on the subject. Even though the guidelines recommend a combination of the EQ-5D and HUI3 to measure HRQL, none of the included studies used this combination. The EQ-5D and HUI3 were used separately in a number of studies [14, 30,31,32,33,34,35, 39, 42,43,44,45, 49,50,51]. Six studies complied to the measurement points required by the guidelines, namely one, two, four and 12 months after injury [48, 49, 51, 58, 64, 65]. Even though other studies did not follow all required measurement points, the majority complied with at least one.

Predictors for HRQL

Recovery patterns of HRQL after injury were found to differ across subgroups in most studies. There was substantial variation in the predictors of HRQL after injury, however, seven predictors were mentioned in six or more articles: age (n = 14), gender (n = 12), pre-injury health status (n = 12), hospitalisation status (n = 7), nature of injury (n = 7), injury severity (n = 7) and socio-economic status (n = 6). Older age and female gender were found to have a negative impact on the outcome of HRQL after trauma in several articles [24, 31, 41, 47, 50, 51], whereas in two other articles male gender was found to have a negative association with HRQL [45, 55].

Changes over time

Studies that reported HRQL values generally reported improvements in HRQL over time (see Table 1). However, not all studies that were included reported specific outcomes of HRQL, as some studies reported on odds ratio and relative risks. Improvement in HRQL was found in all studies, however, pre-injury status or population level was not reached for the total injury population after 6–24 months [24, 26, 31, 36, 44, 46, 47, 49, 55, 60, 62]. Figures 4 and 5 summarise HRQL scores of all articles that provided a mean HRQL score at 12 months after injury. Some articles provided mean scores only per subgroup, and have therefore been included in the figure for each subgroup. Figure 4 shows the physical component score (PCS) and mental component score (MCS) for both SF-12 and SF-36, whereas Fig. 5 shows the summary score for the EQ-5D, EQ-VAS, HUI3 and PedsQL (4.0).

Fig. 4
figure 4

SF-12 and SF-36 scores at 12 months after injury. Note1: The y-axis shows the mean scores, not utility values. Note2: The size of the dots is proportional to the sample size

Fig. 5
figure 5

EQ-5D, PedsQL (4.0), HUI3 and EQ-VAS scores at 12 months after injury. Note1: The y-axis shows descriptive summary scores only, not utility values. Scores are not directly comparable due to the different HRQL measures used. Note2: Scale from 0 to 100 for PedsQL (4.0) and EQ-VAS; scale from 0 to 1 for EQ-5D and HUI3 (score multiplied by 100). Note3: The size of the dots is proportional to the sample size of the study

Discussion

This systematic review aimed to provide an update on studies measuring HRQL with a generic instrument in general injury populations since the publication of an earlier review examining injury studies conducted between 1995 and 2009 [12]. Given the increase in the number of studies conducted in this area over recent years, our review focused specifically on studies that examined HRQL at more than one time point. As with the earlier review, considerable methodological variation across studies was found; differences were apparent in study settings, injury severity of participants, HRQL instruments used, follow-up periods, and timing of HRQL assessments. The most commonly used instruments to assess HRQL included the SF-36, SF-12, and EQ-5D, although 14 different instruments were applied across the 29 studies included in this review. Study follow-up points ranged from 1 month to 10 years post-injury, with follow-up assessments most commonly occurring at 6, 12 and 24 months after injury.

Despite the variation across studies included in this review, it is important to note that improvement in the consistency of study designs was observed since the earlier review of studies measuring HRQL in general injury populations [12]. Our review found a greater number of studies that had employed a longitudinal design over a shorter review period; we identified 29 longitudinal studies over a 9 year period in contrast to the 21 longitudinal studies published across the 14 years examined by Polinder et al. Our updated review also found that longer durations of follow-up have been utilised, with four studies examining HRQL beyond 24 months, and one up to 10 years post-injury. This is in contrast to the earlier review where many studies had examined outcomes until 6 months only, and none had examined outcomes beyond 24 months. These findings demonstrate an increase in adherence to the recommendations of the European Consumer Safety Association [16], which recommends assessments be conducted to a minimum of 12 months post-injury.

While longer follow-up periods are occurring in studies examining HRQL in general injury populations, the timing of assessments continues to vary across studies. The 2007 guidelines recommend assessments at regular intervals of 1, 2, 4 and 12 months post-injury, allowing for examination of the four phases of trauma recovery: acute treatment phase, rehabilitation phase, adaptation phase, and stable end situation [16]. Only five studies completed follow-ups at these time points [48, 49, 51, 64, 65], although five completed assessments at four different times in the 12 months after injury [50, 53, 58], and five examined outcomes at least four times over a longer period (beyond 12 months) [26, 40, 42]. There may be important reasons for researchers selecting different times of outcome assessment than those recommended. For example, examination beyond the 12 month point is likely to be important given accumulating evidence that changes (including improvements and deteriorations) in health status can continue to be detected after this time [59, 60]. Ensuring that participant burden is kept to a minimum is likely to be another important consideration.

Guidelines for the examination of health status among injury populations also recommend the inclusion of a retrospective recalled assessment of pre-injury health [16, 66]. Few studies in our review met this criteria, despite evidence that such retrospective measurements are likely to be more appropriate than comparisons with general population norms when evaluating post-injury losses in HRQL [9, 67]. This is because individuals from the general population are unlikely to be representative of those from an injured population [68]. A systematic review of studies collecting pre-injury HRQL data among injury patients has demonstrated that both general population comparisons and retrospective assessments are likely to result in biased estimates of pre-injury HRQL [69]. However, prospective HRQL data is often impractical to collect prior to an injury occurring. Instead, it may be most feasible to collect retrospective assessments of pre-injury HRQL as soon as practicably possible after injury.

The identification of 14 different instruments to evaluate HRQL across the 29 studies included in this updated review suggests that there remains significant variation in the types of measures used. However, it is important to recognise that this variation has decreased substantially since the earlier systematic review of studies evaluating HRQL after injury, from which 24 different generic HRQL and functional status measures were extracted. This indicates that the potential to make comparisons across studies is increasing. While a number of studies employed the EQ-5D in isolation, no studies used both the EQ-5D and the HUI3 to evaluate HRQL, which is recommended in the guidelines [16]. Many studies used neither the EQ-5D nor the HUI3, instead employing the SF-12 or SF-36 to assess HRQL. Understanding motivations behind the selection of instruments to examine HRQL and disability outcomes after injury is an important avenue for future research. Different outcome measures focus more or less on specific HRQL dimensions and the dimensions of interest to researchers may vary across countries depending on the aspects of health that are most relevant to each unique social, cultural, and political context.

Included studies varied in the reporting of HRQL information. While some studies reported the proportion of people experiencing problems with particular HRQL and disability domains others reported summary or utility scores. The 14 studies included in the review reporting summary scores represents only a slight increase from the 12 studies that did so in the earlier review.

As with the earlier review, our review found that generic instruments are capable of detecting changes in HRQL between discharge and follow-up. Despite continuing variation in study design, it is evident that the greatest gains in health status are observed in the first 12 months after injury. Gains can also be observed in the following 12 months (up to 24 months post-injury) among individuals who have sustained serious injuries (as indicated by injury severity scores and hospitalisation status). Although these gains can be detected, many studies concluded that HRQL remains significantly reduced in comparison to pre-injury levels or population norms, and this is evident up to 10 years after injury [60]. While these insights are important, continued variation in assessment time points, study populations, HRQL instruments, and the reporting of HRQL outcomes makes it difficult to compare findings from individual studies, and reduces the precision of knowledge regarding the global impact of injury on population health over time.

An important limitation associated with this systematic review is that only peer-reviewed published literature was included. It is possible that other longitudinal studies examining HRQL in large injury populations have been conducted but not published. Another limitation is that studies that examined HRQL or disability were eligible for inclusion in the review, and although these constructs are related, they are not synonymous. Despite these limitations, the review provides important insight into the design and findings of studies published since 2010. The variation observed across included studies suggests that the European Consumer Safety Association guidelines for the conduction of follow-up studies may be difficult for researchers to adhere to. Further research is needed to explore the reasons why researchers are not following these guidelines. This information could be used to inform the development of updated guidelines that are feasible to follow when taking into account the significant contextual variation that exists across different countries and populations. This, in turn, may lead to increased consistency in study designs and outcome reporting, allowing for meaningful cross-country comparisons.

Conclusions

Although increased consistency in studies designed to investigate HRQL in general injury populations has been observed since 2010, there remains significant variation that makes comparisons across studies difficult and prevents precise estimates of the impact of injury on global health. Exploring reasons for variation in study design and reporting of outcomes is an important avenue for future research that may inform the development of updated guidelines for the conduct of follow-up studies measuring HRQL and disability outcomes among individuals with injury.