Design
A cross-sectional, observational study was performed within care as usual between September 2015 and April 2016. Ten participating countries joined the study: The Netherlands, Canada, Switzerland, Germany, Austria, South Africa, Australia, New Zealand, China, and the United States of America.
Procedure
Initially, the project team developed a study protocol. Next, participants for the study were recruited through snowball sampling. Contacts involved in FCE research from different countries were approached and asked to join the study as representatives of their countries. In addition to being in charge of the development of the study in their country and data collection as per protocol standards, the representatives were responsible for recruiting clinicians involved in FCE assessments from various facilities within their country. Participating clinicians assisted in the enrollment of patients undergoing FCE assessments. Informed consent was obtained from participating patients and clinicians. Participating clinicians were asked to complete a set of questionnaires prior to patient’s study enrollment. Participating patients were asked to complete additional questionnaires before performing FCE tests. Data from three FCE domains were collected: material handling (floor-to-waist lift), energetic capacity (six-minute walk test), and hand and finger strength (handgrip strength).
Approval to perform the study was obtained from the relevant research ethics board of the countries where data were collected. All procedures were in accordance with the ethical standards of the Helsinki Declaration of 1975 as revised in 2014 [21].
Participants and Societal Context
Patients
Patients who were to be tested with FCE and met the inclusion criteria were eligible to participate in the study. Inclusion criteria for patients were: adult patients over 18 years of age with non-specific sub-acute or chronic musculoskeletal pain, and with sufficient language skills to understand the instructions. Excluded were patients who were pregnant, retired, on permanent sick-leave, who had specific musculoskeletal diagnoses (i.e. fractures, tumors, radicular syndromes), or who had co-morbidities affecting performance or safety during FCE (i.e. cardiovascular conditions).
Clinicians
Because clinicians’ characteristics were considered a potentially relevant social variable that could influence FCE results, clinician data were collected. Clinicians who administered FCE in routine clinical practice and met the inclusion criteria were eligible to participate. Inclusion criteria for clinicians were: FCE-trained clinicians with at least 1 year of experience conducting FCE, more than 20 FCEs administered, and sufficient understanding of English to complete the compulsory questionnaires.
Societal Context
Participants were recruited from one or more facilities from each country. System or societal characteristics of the eight countries that participated in the study were collected and are presented in Online Resource 1.
The total sample was composed of patients from different contexts. Dutch sample was obtained from two outpatient rehabilitation centers referred by their company doctor for a multidisciplinary assessment. Canadian sample was obtained from one rehabilitation center, where individuals undergoing FCE were injured during motor vehicle accidents or were off work due to non-work related injuries. Patients were tested to determine disability related to previous employment (return-to-work) or activities of daily living (ADL), depending on whether the person was working prior to the injury. Swiss sample was obtained from three inpatient and outpatient rehabilitation centers. Inpatients were referred for a 3-week rehabilitation program, and underwent FCE for therapy planning and return-to-work assessment; whereas outpatients were referred by their disability insurance company to determine level of work related disability. German sample was recruited from six multi-professional work-related medical rehabilitation settings. These patients had musculoskeletal disorders and were performing FCE prior to admission to a 3-week rehabilitation program. Austrian sample was obtained from one rehabilitation center run by the AUVA (General Accident Insurance Institute). Patients were manual workers who had experienced an accident at work and were performing FCE for return-to-work assessment. South African sample was obtained from three facilities. One facility performed FCEs to determine work-related disability for insurance companies; whereas the others were a medico-legal practices where the FCE was done to assist in case settlements after road accidents. New Zealander sample was comprised of long-term claimants who had not returned to work after failure to respond to rehabilitation. Patients underwent FCEs for return-to-work assessments, therapy continuation assessments, or to determine disability. Chinese sample was obtained from one hospital in Hong Kong. Patients performed the FCE for return-to-work assessments and prior to participating in work hardening rehabilitation programs.
Measurements
FCE Variables
This study allowed for a variety of FCE protocols as long as they had demonstrated reliability on the lifting test in peer-reviewed articles. The compulsory FCE measurements included are discussed in detail below.
Floor-to-Waist Lift Test
The test characteristics described below are in accordance with the WorkWell protocol (former Isernhagen Work Systems) [6], the WEST-EPIC protocol [22] or the Blankenship protocol [23, 24]. These protocols differ in the type of material and the standardization of the instruction. The WorkWell protocol was operated as a progressive performance test, which began with an easily lifted weight that was gradually increased until the evaluator determined a “safe maximum lift” or until the patient stopped lifting. Patients were instructed to perform repetitive lifting series of a loaded box with as much weight as safely possible from a shelf at waist height to the floor, and back to the shelf. From the initial weight until the “safe maximum lift” weight, five lift repetitions were made with each weight. The safe lifting endpoint has been defined as the maximum load a patient could lift five times, while maintaining a stable spine and without exceeding the patient’s physiological limits [i.e. heart-rate (HR)]. The WEST-EPIC protocol was conducted as a progressive performance testing. The lift test was divided in cycles, which were composed of three subtests (knuckle-to-shoulder, floor-to-knuckle and floor-to-shoulder). These cycles were performed at two frequencies each before incrementing the weight, one lift per subtest and, if the patient was capable, four times. The lift test began with an empty standardized crate of 4.5 kg, which was gradually loaded with masked weights. Patients were blind to any load during the test. After each cycle they were asked whether they would be able to perform that task on a “safe and dependable manner eight to twelve times a day”. The “maximum acceptable load” was identified by observing the patient’s HR, posture and body mechanics, and psychophysical response. The Blankenship protocol was performed as a progressive lifting test. The lift test was used to determine how much weight the patient was able to lift at an occasional frequency (0–33% of the workday). The lift test began with an empty standardized crate of 4 kg, which was gradually loaded with weights to a maximum weight decided by the patient. Aspects of reliability and validity have been studied for all FCE protocols performed in this study [22, 23, 25, 26].
Clinicians recorded patient’s maximum weight lifted in kilograms along with HR before and after the test, patient-reported effort measured with Borg’s CR-10 scale [27], and clinician’s observed physical effort [28, 29]. In addition, the reason for ending the test was recorded [30].
Six-Minute Walk Test (6MWT)
The 6MWT was performed according to the recommendations of the American Thoracic Society [31]. The test was carried out on a flat hard surface, where two markers (i.e. tape, traffic cones) were set 30 m apart. Patients were instructed to walk back and forth between the two markers as much as possible at their own pace for 6 min. Running or jogging was not allowed; however, patients were able to stop and rest during the test. The 6MWT has shown acceptable test–retest and inter-rater reliability, criterion validity and acceptability in adults with chronic pain, fibromyalgia and chronic fatigue [32]. In addition to the distance walked in meters, patient’s HR before and after the test, patient-reported effort measured with Borg’s CR-10 scale [27], whether the test was prematurely stopped, and the reason for ending the test [30] was recorded.
Handgrip Strength Test
Grip strength measurements were taken with an adjustable-handle dynamometer. For standardization, Jamar dynamometer (or compatible device) was set in the second handle position. Following the procedure described by Mathiowetz et al. [33], patients were seated with their shoulder adducted and neutrally rotated, elbow flexed at 90°, forearm in neutral position, and wrist between 0° and 30° dorsiflexion and between 0° and 15° ulnar deviation. In that position, they were instructed to squeeze the dynamometer as hard as possible for three successive trials, left and right hand separately. The mean grip-strength of each hand was calculated and recorded in kilograms. The handgrip strength test has demonstrated acceptable reliability in healthy patients and patients with cervical radiculopathy [34].
Biopsychosocial Variables
Data from healthcare, workplace, legislative, and personal systems as well as clinician and patient characteristics were collected [35].
Patients’ Demographic Characteristics
Age; sex; height, weight, body mass index (BMI); affected body area, duration of pain; country whose social system applied to the patient; cultural background as measured by nationality; mother language; educational level; employment characteristics: job and physical work demands per Dictionary of Occupational Titles (DOT); work status; days off work due to pain; and compensation status.
Brief Psychological Screening
Eight self-reported screening questions for five psychosocial risk factors associated with pain [36]: depression, anxiety, social isolation, catastrophizing, and fear of movement. The response options were standardized in a 0 to 10 scale, where lower scores indicated lower risk. Moderate to high correlations with full-length questionnaires have been demonstrated for anxiety, depression, social isolation, catastrophizing, and fear of movement [36].
Disability (Pain Disability Index—PDI)
A 7-item self-reported questionnaire measuring the degree to which pain interferes with functioning across a range of activities: family/home responsibilities, recreation, social activity, occupation, sexual behavior, self-care, and life-support activity. The score for each item ranges from 0 (no interference) to 10 (total interference) and the total score can range from 0 to 70, where 70 indicates a total interference on life activities. The PDI has been shown to be a valid and reliable measure of pain-related disability, and shows sufficient internal consistency [37].
Pain Intensity (Numeric Rating Scale—NRS)
A self-reported scale to measure the current pain intensity in adults. The scale ranges from 0 (no pain) to 10 (worst possible pain). The reliability and validity of the NRS has been established for patients with rheumatic pain conditions [38].
Work Ability (Work Ability Score—WAS)
A single-item question of the Work Ability Index (WAI), which measures patients’ current work ability compared with their lifetime best. This item yields a score between 0 (unable to work) and 10 (work ability at its best). The WAS has been shown to be a good alternative to the full 28-item WAI [39].
Clinicians’ Demographic Characteristics
Age; sex; profession; workplace: facility, canton/province/state, country; clinical; and FCE experience.
Clinicians’ Pain Beliefs (Adapted Back Beliefs Questionnaire—BBQ)
A questionnaire measuring an individual’s beliefs about back trouble. For the purpose of this study, this questionnaire was adapted to measure clinicians’ beliefs about musculoskeletal pain, for which ‘back trouble’ was changed into ‘musculoskeletal pain’. The BBQ assesses the level of agreement for nine statements on a 5-point Likert scale (no agreement–total agreement). The total score can range from 9 to 45, where lower scores are related to more negative beliefs on pain. The original questionnaire has shown internal consistency and excellent reliability in workers in a manufacturing factory [40] as well as construct validity and test–retest reliability in the general population [41, 42].
FCE Characteristics
Purpose for undergoing FCE, whether results had a direct effect on the patient’s financial situation, type of protocol performed.
Data Analysis
Data records from all the participating countries were merged into a single database. Some variables were recoded for statistical purposes due to uneven variable distributions (pain duration and days off work were transformed into six categories and amount of compensation into five), and to form groups of similar characteristics (work status and affected body part were converted into new variables of six and five categories each).
The dataset was checked for missing data and outliers. If more than 5% of the cases missed information, the distribution of the missing values per variable was checked by comparing the results of the FCE tests of those with missing data to those without missing data. For continuous dependent variables, t tests or Mann–Whitney tests for independent samples were used. The relevance of the variables with statistically significant differences was further examined by comparing the medians of the two groups with boxplots. The influence of outliers (larger than three SD) was examined with Cook’s distance and leverage values.
Descriptive statistics were calculated for patients’, clinicians’, and FCE characteristics, and presented as means and standard deviations for continuous variables, and counts and percentages for categorical variables. In order to assess the explanatory value of biopsychosocial determinants for FCE results, while considering the nested design (participants within clinicians within countries), multilevel regression analyses were performed. The models were created with patients as level 1, clinicians as level 2, and measurement countries as level 3. These multilevel models involved biopsychosocial variables as independent variables and FCE test results as dependent variables.
The multilevel modeling with MLwiN software was conducted. MLwiN uses the Restricted Iterative Generalized Least-Squares (RIGLS) method to examine a model’s goodness of fit. To establish whether the addition of a variable was a significant improvement to the model’s fit, the most recent model’s deviance (− 2 * LogLikelihood) was compared to the previous model’s. The following process was applied:
-
First, 1-level and 3-level null models were built per dependent variable. To evaluate whether clinician and country had a significant effect on the dependent variables, 1-level null models were compared to 3-level null models (i.e. accounted for variance within clinicians and countries).
-
Second, biopsychosocial variables were separately added as fixed effects to the 3-level null model. To determine the association of each of these with FCE test results, the 3-level null model was compared to each biopsychosocial variable’s 3-level model.
-
Third, a selection of the variables to be entered into the multiple multilevel models was made. A minimum of ten measurements per variable is required for valid multiple regression models [43, 44]; therefore, the number of biopsychosocial variables included in the multiple multilevel regression models was limited. This selection of variables was made based on the statistical significance level from the simple multilevel models.
-
Fourth, a series of multiple multilevel models were performed with biopsychosocial variables entered in a stepwise-forward method. When building the multiple multilevel models, only the independent variables that significantly improved the model’s fit (p < 0.05) remained. The statistical significance of the final models was established at p < 0.05.
The reported results of the simple multilevel regression analyses were fixed effect’s explained variance (R2) and p value. R2 was calculated as \((\upsigma _{0}^{2} - \upsigma _{1}^{2})/(\upsigma _{0}^{2}),\) where \(\upsigma _{0}^{2}\)is the variance of the 3-level null model and \(\upsigma _{1}^{2}\)the variance of the 3-level model with the variable [45]. The reported results of the multiple multilevel regression analyses were fixed effect’s unstandardized coefficient and its standard error. Total variance explained by fixed effects was reported as a measure of the relevance of all fixed effects in the final models. To determine the total residual variance which was due to clinicians and countries, or, in other words, the correlation of the outcomes within clinicians and countries, the intraclass correlation (ICC) was calculated.
Diagnostic and descriptive analyses were performed using SPSS software version 22.0 (IBM Corp., NY) and multilevel regression analyses with MLwiN software version 2.35 (Centre for Multilevel Modelling, University of Bristol, UK).