Background

Due to a large number of days on sick leave, rehabilitation measures and pension applications that are related to musculoskeletal disorders (MSD) (Bevan 2015; Murray et al. 2015; Deutsche Rentenversicherung Bund (DRV) 2019a), the question of whether an individual's physical capacity meets the physical demands of work frequently arises in the context of occupational and rehabilitation medicine. To promote a valid assessment of the ability to safely perform activities required at work, Functional Capacity Evaluation (FCE) procedures became part of common clinical practice (Ansuategui Echeita et al. 2019). These assessments are defined as “an evaluation of capacity of activities that is used to make recommendations for participation in work while considering the person’s body functions and structures, environmental factors, personal factors and health status” (Soer et al. 2008).

Various studies have been conducted to examine the measurement properties of FCE procedures. The most recently published reviews (Baets et al. 2018; Kuijer et al. 2012) indicate that there is some evidence for the predictive validity of FCE information with regard to work participation. Overall, the predictive validity seems to be modest. However, the comparability of these results is limited due to differences in participants (e.g. chronic low back pain (Gross and Battié 2004) vs. upper extremity disorders (Gross and Battié 2006)), the FCE information that was used to predict work participation (e.g. number of failed tests (Branton et al. 2010) vs. carrying and lifting tests (Gouttebarge et al. 2009)) as well as the operationalization of work participation (e.g. time until the suspension of time-loss benefits (Gross and Battié 2006) vs. return to work (RTW) (Streibelt et al. 2009)). In addition, several attempts were made to evaluate criteria in terms of their impact on FCE outcomes like the maximal lifting capacity. For example, Gross and Battié (2005) found that “younger, male subjects who reported lower levels of perceived disability and pain intensity lifted at higher levels”. The last three criteria were confirmed in a multicountry study (Ansuategui Echeita et al. 2019).

These results are consistent with the assumption that female sex, pain as well as the perceived disability are negatively related to the ability to handle heavy loads. Therefore, they do not allow conclusions to be drawn regarding whether sex, pain intensity and the perceived disability affect the validity of FCE outcomes. In fact, little is known about criteria that are associated with lower resp. increased risk of assessing the ability to perform work-related activities or the ability to meet physical demands of work incorrectly (Bühne et al. 2020). Such findings would be of great benefit for both FCE developers and health care professionals. According to the first-mentioned, they might provide the opportunity to estimate the utility or necessity of modifications and further developments. Health care providers as well as other professionals that use FCE information for therapy planning or e.g. RTW decisions could derive information about the robustness of FCE results. To our knowledge, only two studies addressed this issue so far. The data were collected in patients with nonspecific low back pain (Cheng and Cheng 2010) and distal radius fracture, respectively (Cheng and Cheng 2011). Within the first study, a negative impact was found for compensable injury and in the case of high physical work demands. In addition, “a 1-day increase on the days from injury to FCE would reduce it [the predictive validity] by 51.7%” (Cheng and Cheng 2010). However, due to methodological uncertainties, the results cannot be transferred. Thus, the aim of the study was to identify patient-related criteria that affect the predictive validity of FCE-based estimations of the ability to cope with physical demands of work.

Methods

Study design and setting

A multicenter cohort study was conducted in eleven German rehabilitation centers offering work-related medical rehabilitation (WMR) programs (Bethge et al. 2019) for patients with various chronic or injury-related MSD. These WMR interventions last three to four weeks and are performed both in- and outpatient. In addition to ordinary rehabilitation, work-related components with a total extent of at least eleven hours are provided. In patients with MSD, FCE procedures constitute a key component of job-related diagnostics. The additional therapeutic components include work-related psychosocial groups, intensified social counselling and work-related functional capacity training (Bethge et al. 2019).

Data were collected by questionnaires at admission, discharge and the 3-month follow-up between August 2018 and August 2020. The study was reviewed by the Ethics Committee of the German Sport University Cologne (118/2017) and funded by the German Pension Insurance.

Participants

Eligible patients with MSD and high risk of non-RTW, e.g. due to a low self-efficacy with regard to work demands or numerous days of sick leave, were referred to rehabilitation by the German Pension Insurance (ten centers), respectively, the German Social Accident Insurance (one center). The specific criteria were dependent on the screening methods used. In the case of the very common procedure SIMBO (Screening Instrument for Identification of a Demand for Medical-Vocational Oriented Rehabilitation), for example, individuals are considered to have a high risk for non-RTW if they expect not to be able to work in their profession in the near future and if the number of sick days within the last 12 months exceeds 26 weeks (Streibelt et al. 2007). The inclusion criteria were the presence of musculoskeletal disorders, absence of contraindications such as acute injuries or serious comorbidities (e.g. cardiac insufficiency, angina pectoris) and signed consent to participate. Patients were excluded if they reported significant changes in their physical work demands or their physical capacity, that occurred after discharge and were unrelated to the individual physical work ability (e.g. loss of physical capacity due to an injury during leisure time, job change that was not initiated as a consequence of excessive physical demands).

Measures

RTW outcome

A successful return to work was considered if participants were working full- (≥ 35 h/week) or part-time (15-34 h/week) at the 3-month follow-up and did not reach or exceed a total duration of 1.5 weeks of sick leave due to MSD following discharge from the WMR program (Streibelt et al. 2009; Bühne et al. 2020). In both cases, the information was taken from the written post-survey questionnaire.

FCE-based estimation of the ability to cope with the physical demands of work

The ELA method (German: Einschätzung körperlicher Leistungsfähigkeiten bei arbeitsbezogenen Aktivitäten) was used at discharge to estimate the ability to cope with the physical demands of work. ELA is a customized short-form FCE containing 24 work-related activity tests such as kneeling, torso rotation and lifting (Bühne et al. 2020). Prior to the test, an interview was carried out between a health care professional and the patient to identify work demands that might exceed the current physical capacity. Based on this, individually relevant activity tests were selected. In ELA, these tests are to be terminated by the assessor in case of a clear insecurity in the execution, dizziness or comparable observations that indicate a potential health risk. Further termination criteria are a heart rate increase above 85% of the estimated maximum (220 minus age) and achieving a weight in lifting and carrying that exceeds the work demands as well as 60% of the patient’s body weight. In addition, subjects are instructed to abort the test if they perceive to have reached their current maximum performance level.

After performing the activity tests, the FCE-certified assessors compared the demonstrated functional capacity to the individual job demands for each activity. These extrapolations to the entire working day were then summarized in an estimation of the patient’s ability to cope with physical work demands using a 5-point scale (very poor via moderate to very good). The characteristics of the procedure are described in detail elsewhere (Bühne et al. 2020). The assessments were guided by 26 physical therapists, eight occupational therapists and ten sports scientists with an average professional experience of 11.3 year (0–27 years). The average FCE experience was 2.0 years (0–15 years).

The predictive validity of ELA was demonstrated in workers with MSD. A positive ELA outcome (moderate to very good) was associated with a sixfold higher RTW chance after adjusting for the current pain intensity, the Work Ability Score and further patient-reported data (Bühne et al. 2020). According to our knowledge, reliability has not yet been evaluated.

Predictive validity of the ELA outcome

Based on the results of a previous study (Bühne et al. 2020), the ELA-based estimation of the patient’s ability to cope with physical work demands was considered valid if RTW was paired with a positive FCE outcome (moderate to very good) as well as if non-RTW was accompanied by a negative FCE outcome (rather or very bad). In the remaining cases, the ELA result was judged as non-valid.

To avoid false conclusions regarding the predictive validity of the ELA result, adjustments were made in three conditions. The rating was inversely converted in participants that reported a change in job or a reduction of their physical work demands which enabled them to return to work successfully. The rating was also converted when participants reported “having great difficulty” or “being unable” to perform the usual tasks at their workplace without restrictions (Müller et al. 2014). In these cases, despite RTW, participants were considered not to have the required functional capacity to cope with the demand of work defined at discharge. Moreover, positive ratings of the predictive validity of ELA were converted into negative and vice versa, when N-RTW was attributed to non-physical reintegration barriers only (e.g. unsuccessful job search).

Patient characteristics

Using questionnaires, the characteristics shown in Table 1 were collected. These procedures resp. variables were chosen as they are recommended for use in WMR or have been shown to be meaningful with regard to FCE resp. RTW outcomes.

Table 1 Patient-reported characteristics collected to identify aspects that affect the predictive validity of FCE information

In addition, FCE assessors recorded the level of physical work demands according to the REFA (Association for Work Design, Business Organization and Business Development) classification (DRV 2019b) as well as the injury location (upper extremities, lower extremities, trunk or multiple regions). ICD-10 diagnoses were extracted from the medical report. Regional unemployment rates were assigned based on the counties of residence.

Statistical methods

Descriptive statistics were used to report sample characteristics. Differences between dropouts and the final sample were analyzed using the Chi-square test, the Mann and Whitney’s U test and the T test. Logistic regression models were calculated to identify patient characteristics that affect the predictive validity of ELA. Continuous health- and work-related characteristics were thereby split according to established cut-offs, respectively, transformed into three-level variables to simulate u-shaped associations (Table 1). These U-shaped associations were to be expected, since valid estimations of the ability to cope with physical demands of work are more likely in cases in which patients report very favorable (e.g. high level of perceived work ability, no pain) or very unfavorable RTW-conditions (e.g. low level of perceived work ability, severe pain). Initially, adjusted only for the ELA result, all 28 characteristics were examined separately. Those characteristics that did not exceed the cut-off (p < 0.1) were then included into further models collectively, while differentiating (1) sociodemographic characteristics, (2) primarily health-related and (3) primarily work-related patient reports as shown in Table 1. The final model was calculated based on those characteristics that still remained below the cut-off (p < 0.1) within these three multiple models. All models were adjusted for the ELA result. Given a small number of participants whose ability to cope with physical work demands was classified as very good resp. very bad, the ELA outcome was converted into a three-point scale (≥ rather good vs. moderate vs. ≤ rather poor). Nagelkerke´s R2 and the area under the receiver operating characteristic curve (AUC) were calculated to assess the model fit. The discrimination was considered acceptable if AUC ≥ 0.7 and excellent if AUC ≥ 0.8 (Hosmer et al. 2013). Odds ratios (OR) with 95% confidence intervals (CIs) were calculated as effect estimates. Statistical tests were regarded as significant if the p value was less than 0.05. Before calculating multiple models, the presence of multicollinearity (r ≥ 0.7; variance inflation factor ≥ 10) was evaluated. A complete case analysis was performed. All calculations were carried out with SPSS 27. Expecting a drop-out rate of 35%, a proportion of valid ELA outcomes of 75% and a multiple model with about ten dependent variables, the recruitment of 650 participants was targeted a priori to avoid overfitting.

Results

Flow of participants and sample characteristics

The recruitment was stopped prematurely in April 2020 due to the COVID-19 pandemic. By this time, 550 participants had already been recruited. The following analyses are based on 303 participants (the flow of participants is illustrated in Fig. 1). Overall, the proportion of missing values was 1.3%.

Fig. 1
figure 1

Flow of participants

Compared to the participants who did not participate in the follow-up survey, the final sample was older (51.0 vs. 44.5 years; p < 0.001), rather female (48.2% vs. 36.1%; p = 0.011) and employed at admission (84.8% vs. 76.3%; p = 0.022) as well as more likely to be in a stable relationship (73.6% vs. 62.4%; p = 0.010).

The number of cases per clinic ranged from 3 to 65 (mean = 27.5). The characteristics of the resulting sample are presented in Table 2 and the appendix. The most prevalent ICD-10 diagnoses were M54.4 (Lumbago with sciatica; n = 40), M53.1 (Cervicobrachial syndrome; n = 28) and M51.2 (Other specified intervertebral disc displacement; n = 18).

Table 2 Patients characteristics (N = 303)

RTW and the predictive validity of ELA

The ability to cope with the physical demands of work was estimated as moderate or better in 180 participants (59.4%) and rather or very poor in 123 participants (40.6%). These estimations were based on an average of 3.6 activity tests (SD = 1.5). At 3-month follow-up, 117 participants (38.6%) had successfully returned to work. Concordance between the FCE and RTW outcome was found in 210 participants (69.3%; Table 3). Eight participants reported that RTW resulted from a change in job or a reduction in work demands and ten stated limitations in productivity. N-RTW was attributed to non-physical reintegration barriers only by 12 participants. After considering these aspects, the ELA result was rated as valid in 208 (68.6%) cases.

Table 3 RTW depending on the ELA-based estimation of the ability to cope with the physical demands of work

Characteristics influencing the predictive validity of ELA

Regarding the current pain intensity, a strong correlation with the pain-related disability at work (r = 0.742; p < 0.001) was observed. Due to multicollinearity, the current pain intensity was not considered in the analyses. Among sociodemographic characteristics, the following variables exceeded the critical p value (< 0.1) and were therefore excluded from further analyses: gender, age, marital status, highest professional qualification, migration background, unemployment rate and level of physical work demands (see appendix). Based on the multiple model calculated for the remaining variables (“model 1”), employment status and native language were also included into the final model (Table 4). As a result of model 2, general health perception, duration of sickness absence within the previous 12 months as well as psychosocial distress were selected for further analysis among the primarily health-related characteristics. With regard to the third group of primarily work-related patient reports, two characteristics did not exceed the critical p-value in model 3: pain-related disability at work and expected RTW-duration. In the fourth and final model, which achieved a high goodness of fit (AUC = 0.877), a native language other than German (OR 0.16; 95%-CI 0.05–0.56), psychosocial distress (OR 0.35; 95%-CI 0.15–0.82), a moderate (OR 0.15; 95%-CI 0.05–0.46) and strong pain-related disability at work (OR 0.19; 95%-CI 0.05–0.72) as well as the expectation to return to work after one month (OR 0.17; 95%-CI 0.06–0.46) were associated with a lower chance for a valid ELA outcome. Compared with 0-coded characteristics only, for example, the likelihood of valid ELA results decreased from 96.9 to 61.7% when psychosocial stress and a moderate pain-related disability at work were reported.

Table 4 Associations between patient characteristics and the predictive validity of ELA: logistic regression models

Discussion

The aim of the present study was to systematically identify patient-related characteristics that affect the predictive validity of FCE information. Data were collected from patients with MSD recruited in eleven rehabilitation centers offering WMR. As previous studies did not allow a pre-selection of relevant characteristics, the study had an exploratory character.

The ELA-based estimation of the participants' ability to cope with physical work demands was found to be valid in 208 of 303 (68.6%) cases. Among the numerous variables considered, four proved to be significant in predicting the validity of ELA: native language, psychosocial distress, the expected duration until RTW and the pain-related disability at work.

An increased chance for non-valid ELA outcomes was found in participants who reported an expected RTW duration of more than one month (OR 0.17). This result can be explained by the nonlinear relationship between RTW beliefs and the predictive validity of FCE-based estimations of the ability to cope with physical demands of work: The greater the (actual and perceived) discrepancy between abilities and requirements, whether positive or negative, the easier and clearer the interpretation of test results. For individuals who neither believe that they will return to work within a short period of time nor that they will not succeed in doing so at all, the profile comparison is, in general, less evident for FCE assessors and the possibility to return to work thus more difficult to estimate. This assumption is consistent with the finding that the ability to cope with the physical demands of work was predominantly estimated as moderate based on the ELA test in patients with an expected RTW duration of more than one month in case of non-valid ELA-results (27 of 44). Furthermore, it is to be considered that the RTW status might have been caused by a self-fulfilling prophecy in some cases. This may not only have influenced the proportion of valid ELA results but may also have led to an overestimation of the importance of the expected RTW duration.

Compared to those participants who reported a low pain-related disability at work or less, a moderate (OR 0.15) and strong disability (OR  0.19) were associated with an increased chance for non-valid ELA results. This finding can also be attributed to the u-shaped association described above. Furthermore, an increase in the pain-related disability at work seemed to have an independent negative impact on the predictive validity of ELA. This finding was expectable, as health care professionals frequently describe the difficulty of differentiating between pain and the (work-related) physical capacity as a key challenge in FCE. Under what conditions is it acceptable to confront patients with work demands while they are in pain? And under what conditions can patients be considered to be able to cope with the physical demands of work despite a pain reaction during FCE? The comparable influence of a moderate and strong pain-related disability at work may therefore be attributed to an increased importance of these questions with greater perceived limitations.

Considering the non-significant influence of the migration background (and thus possible cultural differences), the negative impact of a native language other than German is probably related to a greater difficulty in describing the physical work demands specifically. In contrast, the negative influence of psychosocial distress (OR 0.35) can be attributed to non-physical reintegration barriers.

Non-valid ELA results were primarily related to individuals that did not achieve RTW despite an ability to cope with the physical demands of work that was estimated as moderate. It is therefore questionable whether this estimation should be interpreted positively regarding the ability to cope with the physical demands of work like in the previous study (Bühne et al. 2020). The present findings rather indicate that this category should be interpreted neither positively nor negatively. However, the influence of the four characteristics changed only slightly when restricting the analysis to those individuals whose FCE result could be clearly assigned as positive (≥ rather good) or negative (≤ rather bad). Hence, we consider a substantial bias (resulting from the interpretation of the category "moderate”) to be unlikely.

In contrast, no significant associations were found among the numerous other characteristics. Consequently, the predictive validity of ELA was not affected by age, gender, ICD-10 diagnoses, the employment status or fear-avoidance beliefs as well as those aspects, that Cheng and Cheng (Cheng and Cheng 2010, 2011) associated with lower predictive validity, the level of physical work demands and days from injury to FCE.

It is questionable to what extent the present findings can be applied to further FCE procedures. With regard to future studies, it would therefore be beneficial to conduct a comparable analysis using, for example, outcomes of widely used assessments like the WorkWell Systems FCE (Bieniek and Bethge 2014) or Ergo-Kit (Caron et al. 2015).

Limitations

Despite the strengths of the study—multicentric design, holistic consideration of participant characteristics as well as the inclusion of productivity, adjustments in work demands post discharge and nonphysical RTW-barriers in estimating the predictive validity of ELA—we have to consider the following limitations: (1) With regard to the transformations of continuous variables, which were carried out to address non-linear associations, it must be pointed out that these changes are accompanied by a loss of information. With respect to the work-related characteristics, however, the u-shaped association was confirmed when using the non-modified continuous work-related variables including their second and third power. (2) Studies indicate that the performance of the FCE is associated with an improvement of the patient-reported functional ability (Schindl et al. 2019; Bühne et al. 2017). Since participants were not blinded for the result of the ELA test, this may have affected the RTW outcome. The same applies to a possible positive reinforcement by the FCE assessors. However, in the context of WMR, there does not appear to be an overall positive impact of FCE on perceived work ability (Bühne et al. 2017). In addition, individuals who achieved RTW despite a negative RTW expectation (duration of more than one month or no RTW) constituted a clear minority in this sample. Therefore, and due to the consideration of productivity as well as the days of the sick leave post discharge in estimating the ability to cope with the physical demands of work, a significant bias is considered unlikely. Assuming that the impact on self-perception is part of the essence of FCE, it should also be considered that blinding may affect the practical relevance. (3) It cannot be ruled out that the predictive validity of ELA was also affected by insufficient information about the physical demands of work—related information from the employer is rarely available in WMR—or an inadequate selection of tests. Such a selection has to be carried out in all facilities of WMR, whereby the methods have not been specified by the German Pension Insurance. Neither does ELA provide a standardized approach. (4) The researchers were also not blinded for the ELA outcome. (5) The multidimensionality of RTW was not sufficiently considered in the present analysis. For example, days of sick leave might have been caused by social conflicts at work in some cases. Others, however, may have returned to work despite low functional capacity as they received support from their employer and co-workers. (6) 12 participants were excluded as FCE assessors claimed not to be able to assess the ability to cope with the physical demands of work. The underlying reasons cannot be clearly derived from the available data. It may have been due to a lack of knowledge regarding the individual work demands—six patients were unemployed at his time. (7) No protocol was developed a priori. Although this was due to the exploratory nature of the analysis, it constitutes another limitation. In this respect, further studies are necessary to confirm or reject the present methods and findings. This applies in particular to the observed u-shaped associations, the procedure of estimating of the predictive validity of the ELA outcome as well as the generalizability of the present findings. Despite the limitations mentioned above, we consider a substantial bias to be unlikely.

Conclusion

Among the 28 patient-related characteristics collected, an increased likelihood for non-valid ELA results was present in participants who reported a moderate or strong pain-related disability at work, a native language other than German, psychosocial distress as well as in those who expected to return to work after more than one month.