Introduction

Sickness absence remains a significant cost to developed countries accounting for between 0.5 and 2% of lost GDP in European countries alone [1]. There is evidence that the majority of employees taking a period of absence will only take a short time away from the workplace [2]. However, across all working ages there is a small minority that go on to take longer-term absence, variously defined as greater than 4 weeks absence, greater than 6 weeks and up to 3 months [3,4,5]. However long-term absence is defined, it is the small proportion of people going on to long-term absence who make up the majority of the costs associated with absence from the workplace [6].

At present, it is difficult to predict which employees, in particular those with musculoskeletal pain, will return to work quickly without additional vocational advice and support, which employees will require this support and what levels of support are most appropriate. Consequently, there is no way of ensuring the right individuals are directed towards the right services to support their occupational health needs. There is a growing evidence base around the usefulness of stratified care approaches to delivering healthcare, whereby prognostic information is used to allocate individuals to sub-groups with matched recommended treatments or interventions [7, 8], stratified care has also been demonstrated to be a cost effective model [9, 10]. This approach has not been developed in occupational health yet, but the principles behind it could be used to ensure that scarce occupational health resources are targeted towards those individuals who need more support, whilst also providing reassurance to those for whom sickness absence is unlikely to be become longer term. To allow a stratified care approach to be developed it is important to identify which factors predict work absence and to examine the utility of current prognostic models or tools [11].

It is anticipated that prognostic factors for work absence will be varied. Sickness absence is a complex concept, influenced not only by an individual’s health (or severity of health condition), but also by psychosocial variables, macro system variables (e.g. health services and workplace systems) and wider societal systems (e.g. sickness benefits policies) [12]. For many individuals, decisions about sickness absence will be made in the context of their own health, their own workplace and their own attitudes and beliefs. To support the management of the variety of prognostic variables anticipated in this systematic review, it may be possible to identify some common “core” concepts that can be used to predict the likelihood that individuals will go on to longer-term absence. These concepts can be organised around a framework, such as the disability prevention framework [13], which structures the impacts on the health and work relationship into the “core” concepts of personal systems, healthcare systems, workplace systems and compensation system. Within each of these core concepts are sub-groups which allow an examination of the potential predictors of work absence on a more granular level.

Whilst there is a body of literature examining predictors of sickness absence [14,15,16], there have been no systematic reviews that comprehensively consider which factors are predictive of work absence or the usefulness of prognostic models or measurement tools in identifying those who will have longer-term work absence. Furthermore, there is no evidence focussed on the prediction of absence duration in those that are already absent from work and presenting to primary care. This is a key timepoint in which to be able to provide evidence-based advice and guidance or to refer patients to appropriate services to support them with their health and work in particular those with long-terms conditions, such as musculoskeletal pain (NICE 2019). Therefore, the primary aim of this review will be to identify prognostic factors for duration of work absence in those already absent and examine the utility of prognostic models for work absence.

Methods

This systematic review is reported using the PRISMA guidance [17] and the recommendations of Riley et al. [18] for undertaking systematic review of prognostic factors. The review was prospectively registered with PROSPERO (CRD42020219452).

Search Strategy

An experienced information specialist designed and conducted the searches using a combination of subject headings and key text words. The full search strategy is reported in the online supplement. The following eight databases were searched from their inception to 6th October 2020: MEDLINE; EMBASE; CINAHL; AMED; PsycINFO; HMIC; Business Source Complete; Cochrane Library (CENTRAL), a full updated search was run on 18th September 2023.

Inclusion and Exclusion Criteria

Participants/Population

Studies including employed adults who were on sick leave and seeking or receiving healthcare for a musculoskeletal condition were included. If studies reported on participants who were unemployed, not on sick leave or working modified or alternative duties they were excluded. Studies where participants did not have musculoskeletal conditions or where these were as a result of acute trauma or injuries (such as fractures) were excluded as were studies where the participants had inflammatory arthritis or surgical intervention for their condition.

Study Setting

Studies set in primary (first contact) care, community care and workplace settings where employees have sought healthcare have been included. Studies conducted in hospital populations, emergency care, tertiary care, or rehabilitation centres were excluded.

Study Type

Cohort studies (prospective and retrospective) with an integrated health and work focus were included. Additionally, prognosis studies based on randomised controlled trial data and/or case–control studies were included alongside those papers that reported on tools or models used to predict work absence and summarise the predictive performance of the tool or model used. All other study designs were excluded.

Prognostic Factors

The predictive performance of all identified prognostic factors or prognostic models were evaluated. We did not limit the factors that could be included allowing a full exploration of the breadth of prognostic factors examined in the literature.

Outcomes

The outcome of interest for this review was work absence. Prognostic factors for RTW will be reported in a separate publication. Work absence could be measured in any way (e.g. self-report, employer records, or insurance records) and at any follow-up time point. Definitions of absence were extracted from the studies to allow a comparison of outcome measures.

The strength of association of individual prognostic factors with the outcome were extracted from studies. Where the outcome was binary (absence from work yes versus no) the odds ratio, relative risk, or time to event data were extracted, where the outcome was continuous (e.g. number of days absent from work) the mean differences were extracted.

Screening and Data Extraction

All screening and data extraction was undertaken by pairs of review authors independently. Any disagreements were resolved through consensus bringing in a third reviewer if necessary. The screening of titles and abstracts was undertaken using Rayaan software and the full text screening and data extraction undertaken using Covidence software.

A standardised data extraction form was developed and tested using MS Excel before being used to extract data from the included studies. Study-level data were collected on study design (primary care, community, population/national based, health records (primary care), health records (secondary care), health records (insurance), occupational health, outpatients, hospital/rehabilitation, other secondary care and other setting (defined)). Data were also collected on inclusion criteria, population description, definition of outcome, outcome data type (binary, continuous, time to event), follow-up time period, prognostic factor and description, variables used in adjustment of the analyses, adjusted and unadjusted estimates of the association between the prognostic factor and the outcome. Where studies reported on a model, measures of the model’s performance were also extracted.

Quality Assessment

The Quality In Prognosis Studies (QUIPS) tool [19] was used to assess potential bias in prognostic factor studies and Prediction model Risk Of Bias ASsessment Tool (PROBAST) [20] for studies reporting prognostic models. Two authors independently assessed risk of bias for each study reported as unclear, high, or low risk of bias, for each domain of the tools (QUIPS and PROBAST). These were compared between each pair and any disagreements resolved through discussion or by consulting a third reviewer if necessary.

Assessment of the Strength of Evidence

GRADE was used to assess the strength of the evidence. This method takes into account a number of factors allowing a judgement to be made on the body of evidence overall rather than focusing on individual studies as with risk of bias.

For each of the groups of prognostic factors reported below, GRADE was used to assess the risk of bias with evidence downgraded where more than half of studies had moderate or high risk of bias. Additionally, evidence was downgraded where there was inconsistency in estimates of effect and/or heterogeneity between studies in the definition of the prognostic factor. Downgrading was also applied if there was any indirectness defined as follows: not all the participants were absent and separate results were not reported for those that were; only a subset of the population was represented (e.g. just males/females); the prognostic factor was not fully represented e.g. only a subset of those reporting absence were included. Finally, evidence was downgraded if there was any imprecision which included fewer than 2 studies in each prognostic factor grouping or if there was an insufficient sample size to detect a difference for the prognostic factor in most of the studies.

When considering the strength of the evidence around the prognostic models predicting absence from work an adapted GRADE was used. This was primarily to ensure that appropriate consideration of the performance of the included models was included, the guidance from Foroutan et al. [21] was used. Evidence was downgraded where calibration was imprecise with wide variation in point estimates overall and wide confidence intervals.

Evidence was deemed to be high quality if none of the domains were downgraded, moderate quality if one of the domains was downgraded, low quality if two were downgraded and very low quality if three or four were downgraded.

Data Synthesis

A narrative synthesis was planned to allow for variation in outcome measures, settings and prognostic factors included in the studies within this review. Whilst the Popay narrative synthesis framework [22] had been planned to be used, the more recent Synthesis Without Meta-analysis (SWiM) framework was used to structure the data synthesis [23], this framework provides a guide with which to group, describe and report the results of this systematic review and was considered a more appropriate approach to synthesising the evidence in this review.

Grouping of Prognostic Factors

Due to the wide variation in prognostic factors measured within the studies in this review, they were grouped into broad domains. In total there were 110 individual factors identified which were grouped via discussion within the team into 17 broad categories. These categories were further grouped for synthesis to broadly fit the categories of the Disability Prevention Framework [13]; however, there were no variables that could be grouped into the compensation system concepts and just one variable reporting a healthcare prognostic factor (Fig. 1).

Fig. 1
figure 1

Overarching groups and categories of prognostic factors

Description of Standardised Metric

This paper aimed to identify prognostic factors for work absence and therefore, a range of metrics were extracted and recorded. For binary outcomes, odds ratios (OR), relative risk (RR) and risk reduction were recorded, for continuous outcomes mean differences were recorded and for time to event outcomes hazard ratios (HR) were recorded. Where available the adjusted and unadjusted effect estimates were recorded. These metrics for reporting prognostic factors are recommended in the CHARMS-PF checklist [24, 25].

Methods of Synthesis

There was significant inconsistency across prognostic factors in terms of measurement and analysis and also inconsistency in outcome measure so a formal meta-analysis was not possible. However, data were sufficient to report the range and distribution of observed effects as well as identifying whether there was evidence of an effect in one or more studies examining the same prognostic factor and also to explore the direction of any effects seen.

Results

The searches returned 1655 references. Following de-duplication 1609 references remained. After completing screening of titles and abstracts, 358 full texts were retrieved for assessment of eligibility with 23 studies included in the current systematic review and a further 48 studies identified for inclusion in a separate review reporting RTW as the outcome (Fig. 2).

Fig. 2
figure 2

Flow diagram of study selection. From Page et al. [17]. For more information, visit: http://www.prisma-statement.org/

Results: Prognostic Factors

Across all 23 studies 111 individual prognostic factors were identified. There was considerable inconsistency in study design, prognostic factor measurement, outcome measurement, time point of follow-up and analysis methods across the included studies meaning a meta-analysis was not appropriate. Prognostic factors were grouped into 18 themes for ease of management and reporting (Fig. 1) which can be considered within three domains of the disability prevention framework: personal system, workplace system and healthcare system.

Description of Included Studies: Prognostic Factors

Table 1 provides an overview of the included studies. In summary, the studies were mainly conducted in North America (9 in Canada and 4 in the USA), this was followed by the Netherlands with 3 studies, 3 originating in Australia, with the rest from other European countries (5 studies).

Table 1 Study characteristics (prognostic factors)

The majority of studies were conducted using records from healthcare insurance databases (10 studies) with 3 studies conducted in primary care, 4 in occupational health and the remaining 6 studies from other settings. Prospective cohort studies were the most common design (15 studies) with retrospective cohorts (n = 5) and health record reviews (n = 2) being less frequently employed.

The outcome measure of work absence was defined differently in all studies, although the number of days absence from work was the most commonly used metric, this was calculated differently across studies, from

  • A simple count of days from company records as reported by Abenheim et al. [26] and Bosman et al. [27].

  • Working or not working at 6 months as reported by Okurowski et al. [28].

  • To more complex calculations such as that reported by Nordin et al. [29] where the days of absence were recorded from phone interviews with participants or where compensated days were calculated as reported by Abenheim et al. [26] and Lederer et al. [30].

The length of follow-up also varied ranging from 6 months or less in five studies [28, 29, 31,32,33], 12 months in five studies [34,35,36,37,38], 2 years in four studies [26, 39,40,41] and two with longer-term follow-up at 3 [42] and 5 years [43]. One study was not clear in the reporting of duration of absence, Shiels et al. [44], however, reported participants were followed up for greater than 1 month.

Most of the studies reported prognostic factors only; however, there were 13 prognostic models identified [27, 28, 33, 35, 37, 40,41,42, 45,46,47,48,49], some were models that had already been developed and were being tested in new populations and others were developed within a specific population.

Risk of Bias: Prognostic Factors

The summary judgements for each domain of the QUIPS tool are reported in Fig. 3. Six of the included studies had at least one domain that was considered high risk [28, 36,37,38,39, 42] and a further study was considered high risk overall due to the number of domains scoring moderate risk of bias [41]. The most common reason for a high risk of bias was study attrition, either through a large number of participants being lost to follow-up or studies not reporting attrition or the potential effect of this on the studies’ findings. The high risk of bias of these studies is reflected in the GRADE assessment. Just three studies were considered at low risk of bias [26, 35, 40], with the remaining five studies at moderate risk of bias with a lack of consideration or reporting of potential confounding being the most common domain to be reported as moderate risk.

Fig. 3
figure 3

Risk of bias (QUIPS)—domain summary assessments

Strength of the Evidence: Prognostic Factors

GRADE was used to assess the strength of the evidence and reported by grouping the prognostic factors into the 17 broad categories reported above (Fig. 1). For each of the categories the supporting research was assessed using an adapted GRADE criteria and an overall judgement was agreed (Table 3).

None of the categories were judged to be strong, with the strength of the evidence being low and with one theme “function” having a very low grading. The only factor that demonstrated an overall protective effect was age where an increasing age was associated with a lower risk of absence (reported in 9 studies) [28, 32, 35, 36, 38,39,40,41, 44]. All other themes were associated with a higher risk of absence; however, comparisons within and between themes were impeded by the differing measures used across the studies.

Summary of Findings by Personal Systems

Most of the categories included in personal systems (Fig. 1) reported on inconsistent measures and outcomes and therefore provided a very mixed picture in terms of the contribution of that category to predicting work absence. There were some specific categories that warrant a fuller reporting as the direction of effect tended towards a more consistent direction, these are age, sex, recovery expectations and previous work absence.

Age

Age was reported in many different ways across each of the studies; however, it did demonstrate an overall protective effect where increasing age was associated with a lower risk of absence. For example, Steenstra et al. [48] reported age in 10-year increments from 15 to 25 years through to 55–65 years and found that for time on benefits there was a dose-response effect when compared to the 25–35-year age group, with the 15–25-year age group reporting increased absence (hazard rate ratio 1.27; 95% CI 1.00, 1.60) and the older age groups reporting a lower risk of absence with increasing age.

Sex

Those studies reporting sex as a prognostic factor demonstrated no consistent direction of effect [32, 35, 39,40,41, 44, 49]. For example, Abasolo et al. [39] found that women were less likely to experience temporary work disability when compared to men (HR 0.84 95% CI 0.78, 0.90); however, there was no difference in recurring work disability (HR 1.13 95% CI 0.97, 1.32). Richter et al. [35] reported that men were more likely to experience absence at follow-up when compared to women although not statistically significantly (HR 1.59 95% CI 0.78, 1.22). Steenstra et al. [41] found that women were more likely to experience a recurrence of absence over two-year follow-up (hazard rate ratio 1.36 (95% CI 1.09, 1.70).

Recovery Expectations

Four of the studies included reported on recovery expectations, broadly the better a participants’ recovery expectations the better the outcome [33,34,35, 48]. For example, Turner et al. [33] reported a dose-response effect with recovery expectation of 0 (on a 0–10 scale) having an odds ratio of 9.18 (95% CI 5.00, 16.84) for 6-month work disability (defined as number of days on wage replacement) when compared to those with a very high recovery expectation. This odds ratio reduced to 1.95 (95% CI 1.18, 3.20) for those reporting a high recovery expectation of 8–9 (on a 0–10-point scale). However, it should be noted that there was no protective effect of recovery expectations in the study by Turner et al. People who were unable to identify when they would return to work had a poorer outcome in the study by Richter et al. [35] HR 0.23 (95% CI 0.15, 0.34) and those who reported they would return to work over a month later reported a HR of 0.24 (95% CI 0.15, 0.38) indicating that a poorer recovery expectation was associated with a reduced “risk” of getting back to work.

Previous Absence

Four studies reported on previous work absence and whether it can predict future absence [32, 38, 39, 41]. All studies measured previous work absence as a previous “claim,” the general direction of effect was of previous work absence being predictive of future work absence. For example, Lederer et al. [32] reported a HR of 0.91 (95% CI 0.87, 0.94) for previous claim history in the past 5 years (for return to work) and Van Dujin et al. [38] found that prior sick leave (in the past 12 months) had a HR of 1.50 (95% CI 1.03, 2.17) at univariate analysis, but this variable was not included in the multivariable analyses.

There was some evidence that mental health may contribute to absence with Turner et al. [33] reporting that mental health below the population mean (measured using the SF-36-v2) was associated with increasing absence; however, these were not significant results (< 2 standard deviations (SD) below the mean OR 1.59 (95% CI 0.82, 2.08), 1–2 SD below the mean OR 1.84 (95% CI 0.99, 3.42)  and < 1 SD below the mean OR 1.66 (95% CI 0.91, 3.03). Turner et al. found no effect on absence related to catastrophising or blame. A high fear of movement was not reported to be associated with absence by Richter et al. [35] OR 0.94 (95%CI 0.67, 1.33); however, Turner et al. [33] found that a high fear avoidance at 5–6 points measured with the Fear Avoidance Behaviour Questionnaire was associated with absence OR 4.64 (95% CI 1.57, 13.70) and at 3–5.9 points OR 2.96 (95%CI 0.98, 8.90).

General health and quality of life were reported by Richter et al. [35] (general assessed with one question good versus poor) and Selander et al. [36] (using the SF36) but there was no evidence of a relationship with work absence. Van Dujin et al. [38] found that those participants who reported their musculoskeletal pain to be a chronic condition were more likely to experience absence OR 1.6 (95% CI 1.2, 2.32); however, none of the other studies looked at this prognostic factor.

Pain was measured by three studies Richter et al. [35], Lotters et al. (low back pain and “other” MSK pain) [34] and van Dujin et al. [38] using a 0–10-point likert scale, and all indicated that an increase in pain was significantly associated with work absence with effect sizes (OR) ranging between 1.1 and 1.3.

Steenstra et al. [41] demonstrated a dose-response effect with worsening functional ability, measured using a 0–4 scale, associated with time on absence benefits and risk of absence recurrence.

Summary of Findings by Workplace Systems

Work schedule was examined by Absolo et al. [39] who reported that being self-employed was protective of absence whilst having an indefinite work contract or being a “general” worker was associated with poor absence outcomes. Absolo et al. [39] also found some indication that specific work demands related to movement, e.g. frequent kneeling, flexion and rotation of the trunk were associated with absence; however, the effect sizes whilst generally significant were very small with OR between 1.05 and 1.39. Work culture was assessed by two studies both using different measures; however, both indicated that poor relationships at work and employer doubt about pain were indicators of absence [33, 41]. Richter et al. [35] found that not being satisfied at work was again associated with absence. However, the availability of modified duties during sick leave [38] and continued salary during absence [41] were also indicative of absence.

Results: Prognostic Models

There was some overlap with studies reporting both individual prognostic factors and developing prognostic models. Overall, there were 13 prognostic models identified, some were models that had already been developed and were being tested in new populations and others were developed within a specific population.

Description of Included Studies: Prognostic Models

Table 2 reports the descriptive factors of the prognostic model studies. The majority of studies were undertaken using insurance health records (9 studies in total [28, 35, 37, 40, 41, 46,47,48,49]), two studies in occupational health settings [27, 45], one each in a general population [33] and primary care setting [42]. As with the prognostic factor papers, the measures of absence were varied, there was no consistency in reporting and all studies used a different outcome measure.

Table 2 Study characteristics (prognostic models)
Reporting of Models

There was wide variation in the reporting of the models included in the review (Table 3). Multivariable logistic regression was used by 3 studies [27, 28, 46] with logistic regression also reported by 3 studies [33, 42, 49] and one study reporting negative binomial regression [40]. A further 5 studies reported that Cox regression had been used for analysis [35, 37, 41, 45, 48].

Table 3 GRADE assessing strength of the evidence for predicting absence (prognostic factors)

Validation was carried out in only half of the included studies with 7 studies reporting that internal validation had been undertaken; however, validation was not reported in 6 studies [28, 33, 35, 40, 42, 46].

There was no consistency in the reporting of the models’ performance with most studies reporting the area under the curve or c statistic [27, 28, 37, 41, 42, 46,47,48] and the other studies reporting the sensitivity and specificity, [27, 37, 42, 47] positive and negative predictive value [27, 28, 47]. Five studies did not report any measure of their models’ performance [33, 35, 40, 45, 49].

None of the prognostic model papers reported the calibration of the models developed, so no observed:expected ratio or calibration slopes were presented and therefore, no assessment on the calibration of the models included here could be made.

Risk of Bias: Prognostic Models

Figure 4 presents the overall judgement of risk of bias based on the domains of the PROBAST tool. Overall, 62% of studies had a low risk of bias and also performed well in judgement of the domains assessing participants, predictors and outcomes. The main area for concern was the analysis where 38% of studies were at high risk of bias; this was often due to a lack of information reported in individual studies meaning assessment of how the analysis was performed was not able to be made.

Fig. 4
figure 4

Risk of bias (PROBAST) summary judgements

Strength of the Evidence: Prognostic Models

Using the adapted GRADE to take account of the performance of the prognostic models included in this review, it was identified that the evidence for the use of the prognostic models was low (Table 4). This was primarily due to poor reporting of the models’ performance and one study who included a small percentage of participants who were not absent from work affecting indirectness [42]. Whilst the threshold for downgrading due to the risk of bias assessment was not met, it is worth noting that five of the 13 studies had a high or unclear risk of bias (not quite meeting the 50% required to downgrade).

Table 4 GRADE assessing strength of the evidence for predicting absence (models)

Discussion

Summary of Main Results

A total of 23 studies were included in this review that all reported on prognostic factors for work absence in populations with musculoskeletal pain who were absent from work. Within these 23 studies 13 had developed prognostic models aimed at predicting absence from work. A total of 110 individual prognostic factors were identified and these were grouped into those related to personal systems and workplace systems aligned with the Disability Prevention Framework, within this overarching framework groups of prognostic factors were categorised for ease of comparison, these categories included all prognostic factors measuring the same concept (Fig. 1). Overall, for both prognostic factors and prognostic models, the strength of the evidence was low to very low. This grading of the evidence is due to the heterogeneous nature of the studies where prognostic factors, outcomes and timing of outcome measurement were different across studies; furthermore, reporting of model performance was also mixed with different statistics reported or performance measures not reported at all. The Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) Statement was published in 2015 [50] and so was available for just two of the papers included in the review which may account for the issues in reporting that were seen when the prognostic models were synthesised [27, 48].

Study Strengths

We have followed the recommendations of each of the appropriate reporting checklists including Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist and the CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) [24]. Furthermore, we have followed the guidance of Damen et al. [51] who report a step-by-step guide to conducting systematic reviews of prognostic model studies, including assessment of the performance of the models.

This review has comprehensively searched the literature on prognostic factors for work absence in those with musculoskeletal conditions. By considering how to group or categorise prognostic factors a priori using the Disability Prevention Framework [13], we have ensured that sense could be made of the large number of prognostic factors identified. Being able to frame the impact of specific groups of prognostic factors within a framework mediated the impact of the heterogeneous nature of the studies, whilst we were unable to compare “like with like” we were able to assess the concepts overall and consider their contribution to predicting work absence.

The use of the GRADE system adapted to assess the strength of the evidence in prognostic factor and prognostic model studies has allowed summary judgements to be made and highlighted the inconsistencies in measurement and reporting of the studies included in the review. The adapted GRADE to include an assessment of the prognostic models’ performance has ensured that all available and pertinent data have been incorporated into the assessment of the strength of the evidence [21].

Study Limitations

There are some limitations to the current study, principally related the heterogeneity of the studies identified as part of this review. Whilst an individual patient data meta-analysis is often considered the gold standard (Cochrane https://methods.cochrane.org/ipdma/about-ipd-meta-analyses) in assessing the influence of a factor on an outcome it is not always possible when the quality of the studies is low. Due to the heterogeneous nature of the studies included and given that studies have controlled for different potential confounders, we were unable to consider any kind of meta-analysis, nor would this be wholly appropriate for this type of review. To address this and make sense of the varied measurements and outcomes, we aimed to categorise prognostic factors a priori and as far as possible assess the contribution of each category to predicting sickness absence. It was therefore important to ensure that synthesis of findings was as structured and transparent as possible. We had planned to use a narrative synthesis [22] but felt that the Synthesis without Meta-analysis (SWiM) framework was more suitable for this review as it provides a guide with which to group, describe and report the results of systematic reviews. The SWiM framework provides a more transparent method on how the studies’ findings were synthesised allowing a clear description of the findings to be reported and a more standardised approach to be followed when considering metrics and summaries of data.

Comparison with Other Studies

There are a number of reviews that are similar but focus on narrower populations. Kuijer et al. [52] reviewed the literature exploring the prediction of sickness absence in patients with chronic low back pain and found the same problems identified in our review with variable measurement of predictors, timing of follow-ups and differing definitions of outcome. Kuijer et al. [52] concluded that no common set of core variables could be used to predict work absence in this specific population with chronic low back pain, the current review also noted that there was no common set of core predictor variables or even outcome measures or follow-up points, indicating that little has changed since the Kuijer et al. review [52]. A recent Cochrane review by Hayden et al. [53] focussed specifically on whether recovery expectations predict outcomes including work participation for which absence is a measure, in a population with non-specific low back pain. Hayden et al. reported that there was moderate quality evidence that positive recovery expectations are strongly associated with better work participation. This finding is in part supported by the results of this review where broadly the better a participants’ recovery expectations the better the outcome. Other research has also identified previous absence as a predictor of future absences and whilst the evidence was weak in the current review the general direction of effect seen in this review supported this finding [15, 54, 55].

A recent review from Ravinskaya et al. [56] which assessed the reporting of work outcomes in randomised controlled trials also reported variability in work participation outcomes including work absence which was measured in the following ways: return to work rate, time to return to work, sick leave rate and sick leave duration. The authors concluded that a core outcome set for measurement of work participation is required and have gone on to develop that core outcome set recommending that studies including participants who are absent from work should report on the proportion of workers that return to work and time to return to work [57]. This core outcome set would ensure that comparisons between studies can better be made and may allow more pooling of data to strengthen the body of evidence.

All the studies included in this review meet the criteria for exploratory prognostic studies and models in that they are describing associations and developing prediction models as described by Kent et al. [58]. Exploratory prognostic studies are usually carried out where little is known about a condition and they are an essential early step towards a confirmatory study [58]. However, given the number of studies included in this review and the number of prognostic factors measured it is difficult to argue that little is known about what predicts work absence in those with musculoskeletal pain. Whilst there will be important predictors not measured in these studies, our review indicates that there are commonalities in the concepts that may predict work absence but there is a wide variety in how the specific prognostic factors within these concepts are measured, the main concept that indicated any predictive ability was age; however, age was measured in a variety of ways including “per year” [39], in 5-year increments [28] and in various categories [32, 35, 44] making meaningful comparisons between studies difficult. However, most prognostic studies within the field of musculoskeletal conditions are exploratory at present indicating that further research is needed to move this field forward [59]. In particular, by examining why there are differences in the extent to which models and factors predict absence.

Conclusion

This study has systematically reviewed the evidence for prognostic factors of future sickness absence in those with musculoskeletal conditions who are currently experiencing absence. Overall, the evidence for all prognostic factors was weak, although there was some evidence that older age and better recovery expectations were protective of future absence and that previous absence was likely to predict future absences. There was weak evidence for any of the prognostic models in determining future sickness absence. Analysis was difficult due to the wide range of measures of both prognostic factors and outcome and the differing timescales for follow-up. Future research should ensure that consistent measures are employed and where possible these should be in-line with those suggested by Ravinskaya et al. [57].