Eligibility criteria (PICOS)
Population
All studies of working age adults (18–65 years) on SA due to CMDs (i.e., mild to moderate symptoms of depression and anxiety disorders or symptoms related to conditions related to stress such as adjustment disorder or burnout) or musculoskeletal disorders were included in the review. Employment was not a requirement; unemployed on sickness benefits, and self-employed were also included. Exclusion criteria included studies focusing on participants with severe mental disorders such as psychosis, bipolar disorder, and substance abuse. Studies including participants with secondary pain due to malign illnesses or pain related to a prior accident were also excluded.
Interventions
All types of psychological interventions or psychotherapy were included. Psychological interventions were defined as being based on a psychological model or theory where qualified clinicians or treatment personnel deliver the treatment. Examples of therapies included are problem-solving therapy (PST), cognitive behaviour therapy (CBT), psychodynamic therapy (PDT), Multimodal Cognitive Behavioural Therapy (MMCBT), and Motivational Interviewing (MI). All types of psychological interventions were included if they were based on psychological theory and the purpose was to influence psychological processes with the aim to increase function or decrease symptoms. Interventions that did not have a coherent theoretical base, e.g., coaching, were excluded.
Controls
All control conditions were accepted, including psychological or non-psychological treatments, treatment as usual, pharmacological treatment, and waitlist. When there was more than one psychological treatment and a non-psychological treatment, all psychological treatments were compared with the non-psychological treatment as control condition. If a psychological treatment was compared to another psychological treatment within the same study, the experimental treatment and control group as chosen by the authors of that study were considered active treatment and control group, respectively.
Outcome measures
The primary outcome was time on SA, RTW, or increased working hours. There are many definitions of absence from work due to sickness. The present meta-analysis defines outcomes as fitting at least one of the following categories: time until first RTW, time until full RTW, cumulative duration of SA, i.e., total days of SA during the follow-up period (can be due to one or more SA spells), recurrence of SA (time in number of days until a recurrence or number of recurrences during follow-up), increased working hours, and time on disability pension. Data could either be presented as means and standard deviations (continuous) or as event data (categorical). Secondary measures of symptoms of depression, anxiety and stress were also included.
Study design
All randomized controlled trials (RCTs) including psychological interventions where an outcome of RTW or SA is included.
Literature search
An extensive search was conducted in the following databases: Medline (Ovid), Web of Science Core Collection, Scopus, PsycInfo (Ovid), and PubMed until 2017-03-06. The initial search was conducted 2014-12-18 and the final search strategy was updated at two time points (2016-10-21, and 2017-03-06). Search strategies for the different databases are presented in Online Appendix 1.
Other resources
We also searched reference lists of other reviews and eligible studies. In some cases where data were missing in otherwise eligible studies, the authors were contacted to determine if complete data were available.
Study selection
Titles and abstracts of studies identified were stored in a database. Duplicates were removed and a bibliography including title and abstract was created. The study selection was completed in two steps. First, two authors independently screened titles and abstracts of all references to determine if each study met the inclusion criteria (AF reviewed all studies and the other co-authors reviewed a subdivision of studies each). A standardized digital form with inclusion criteria was used for this purpose and the inclusion criteria were: participants with CMDs or musculoskeletal disorders on SA and in working age, psychological intervention, and RCT. All the studies identified as possibly eligible in the first step were then fully reviewed a second time in full text format by two review authors (AF and PE), and subsequently assessed for inclusion and methodological quality. Exclusion criteria (population, intervention, outcome and design) were documented for each excluded study throughout the entire inclusion process. Figure 1 shows a flowchart of the inclusion of studies in the present meta-analysis, conducted according to the PRISMA criteria (Liberati et al. 2009).
Data extraction
The first author extracted the data into an extraction form including essential study information, interventions, results on outcome measures, and data on moderator variables. These data were then double-checked by the second review author (PE). When there were disagreements about the data extraction, consensus was achieved by discussion. Since there were extensive heterogeneity in how studies reported SA, many studies were discussed. When no solution on how to extract data was achieved, e.g., due to missing data for the calculation of effect sizes, the study was excluded (see flowchart, Fig. 1).
Categorization of potential moderators
Two categories of moderators were investigated; categorical and continuous. Categorical moderators included factors related to the intervention and study context. Continuous moderators included patient demographics and methodological quality of the studies. Moderators are further described below.
Diagnostic group
Study populations were categorized as CMD (i.e., depression, anxiety or stress-related ill health), musculoskeletal disorders, or CMD and musculoskeletal disorders.
Diagnosis
Study populations were categorized as depression, adjustment disorders, musculoskeletal disorder, CMD or musculoskeletal disorder, and CMD when there was a mix of mental health disorders in the sample.
Sickness absence duration
The number of weeks of continuous SA before randomization was noted for each study.
Type of treatment
The various psychological interventions were categorized into five subcategories: CBT (various types of CBT not specifically targeting the work situation), W-CBT (the treatment manual specifically targets RTW or work processes), PST, SFT, and MMCBT including interventions by at least two different professional categories. Control conditions were categorized as psychological interventions (if not the experimental condition in the trial), non-psychological interventions, treatment as usual (TAU) or waitlist (WLC).
Therapist profession
The professions of the therapists were categorized as occupational physician (including labour expert), psychologist (including psychotherapist), multimodal team (consisting of at least two professional categories), or other (including other mental health workers, social workers, stress management consultants, postgraduates, physical therapists, behaviour therapists, and one study where therapist profession was not specified).
Setting
Treatment setting was categorized as occupational health service, primary care, rehabilitation centre, and university.
Attrition
Participants who participated in at least one session but dropped out before treatment completion were counted as dropouts. In studies where the number of participants starting treatment was not reported, dropouts were counted from the number of participants randomized to treatment.
Other treatment-specific moderators
Several clinically justified moderators concerning the nature of the treatment were specified and categorized for each study. Duration was counted as the number of weeks that the intervention lasted (if there was no pre-defined intervention time, the number of weeks was used). The number of sessions, total treatment time (hours), intensity (hours per week), and booster sessions (Yes/No) was specified. Further, it was noted whether the intervention included workplace interventions (Yes/No) and if there was a clear work-focus, i.e., the full extent of the treatment protocol was tailored to target work or RTW (Yes/No). Whether the study evaluated therapist adherence to treatment protocol (Yes/No) and therapist competence (Yes/No) was also noted. Statistical analysis was categorized as intention-to-treat (ITT) if all randomized participants were included in the analyses and as completer analysis if dropouts were excluded. Year of publication and country of origin was noted for each study.
Methodological quality
The psychotherapy outcome study methodology rating scale (Öst 2008) was created with the aim of allowing for a wider range of scores than what was offered by prior RCT methodology scales. The scale consists of 22 items which are displayed in Table 1.
Table 1 Items of the psychotherapy outcome study methodology rating scale
Two items, 5. Specificity of outcome measures, and 6. Reliability and validity of outcome measures were adapted for evaluating measures on SA/RTW. For specificity, measures on incidence were regarded as poor, time to event as fair, and continuous measures such as mean SA days or number of working hours or recurrent SA days were regarded as good. This categorization was based on the notion that continuous data lose specificity when it is dichotomized, hence provides less information compared to continuous data. This may be important specifically for SA data where the sum of SA days can be regarded as a more specific measure rather than the incidence of SA at a certain follow-up point taking into consideration the possible variability of SA status during the follow-up period. For reliability and validity self-reported data was regarded as fair and registry data as good.
Each item is rated as 0 = poor, 1 = fair, 2 = good, allowing for a range of 0–44 points. The internal consistency of the scale was acceptable with a Cronbach’s α = 0.622. The inter-rater reliability for the scale (between the first and second author) based on a random selection (20%) of the studies was ICC(2, 1) = 0.87 for the total score indicating a good overall inter-rater reliability.
Meta-analysis
In the present meta-analysis, data from the retrieved RCTs were used to calculate effect size (ES) and to perform a meta-analysis on the continuous outcomes (SA days, increased working hours, etc.) and proportions of participants that successfully had achieved either partial of full RTW. The data were pooled with the software Comprehensive Meta-Analysis (CMA), version 2.3 which was used for all analyses. Since it cannot be expected that all effect sizes from the included studies come from the same population of effect sizes (because of the heterogeneity in the type of work disability, duration of SA, and the variation in interventions among studies), we employed a random effect model to compute the effect sizes. The results of each RCT were plotted as point estimates with corresponding 95% confidence intervals (CIs). Most RTW results were reported as time-to-event data (SA days or time until partial or full RTW). Means and associated standard deviations (SDs) were extracted for the cumulative duration of SA and for secondary outcomes levels of depression, anxiety, and stress symptoms. The ES was calculated as (Mintervention − Mcontrol)/SDpooled for post- and follow-up assessments. Since there was no pre-defined post-assessment in a large proportion of the included studies, the mean of all follow-up assessment points was used to calculate ESs. Additionally, in case of more than one effect measure, the mean of these was used for each study. Each study contributed with an average of 2.6 ESs for continuous measures and 2.5 ESs for categorical measures (all measurement points combined). Before pooling the ESs the dataset was screened for statistical outliers. Instead of deleting outliers, they were replaced following the principles of Winsorizing (Lipsey and Wilson 2001) by reducing them to the exact value of M + 2SD. There were seven (6%) and four (8%) outliers replaced in the datasets with continuous variables and categorical variables, respectively.
Hedges’ g was computed to correct for small sample sizes. Values between 0.20 and 0.49 represent small ES, values between 0.50 and 0.79 are considered moderate ES, and values of 0.80 or higher represent large ES (Cohen 1988). For data on the number of events, odds ratios (OR) were computed. Values from 1.5 were interpreted as a small effect, 2.5 as a moderate effect and 4 as a large effect (Rosenthal 1996). The heterogeneity of the ES’s was calculated based on the Q-statistic (heterogeneity in ESs beyond random error) and the I-squared statistic (the percentage of the observed variance that shows actual differences in ESs between studies). Values above 75% indicate high heterogeneity, 50% medium heterogeneity, and 25% low heterogeneity (Higgins et al. 2003). Publication bias was assessed by examining the funnel plot on primary outcome measures, with the trim-and-fill method of Duval and Tweedie (2000) and Eggers’s regression intercept (Egger et al. 1997). Moderator analyses of continuous variables on which at least 75% of the studies provided information, were carried out with the meta-regression module in CMA (fixed effects model). For categorical variables, sub-group analysis using the mixed effects model was applied to assess moderation. If there were less than two studies in any condition being compared, the studies in that condition were excluded. Cochran’s Q (Qbetween) was computed to verify whether subgroups of treatments had identical effects. Statistical significance was defined as p < 0.05.