Participants and Context
Data were obtained between 2013 and 2016 in a study with a naturalistic observational design. The study sample consisted of patients with CMD who received work-focused treatment in an outpatient mental health clinic at Diakonhjemmet Hospital in Oslo, Norway. The sample was followed prospectively from intake at pre-treatment until 12 months after treatment. The treatment model was previously described in Gjengedal et al.  and consisted of short-term therapy with flexible, integrated interventions related to work place assessment and adjustments and drafting of return to work plans . The patients in the intervention group attended a mean of 10.40 sessions (SD = 3.09) over a mean duration of 17.74 weeks (SD = 6.67).
Patients were referred to the clinic by their general practitioners (GPs). The GPs determined if the patients were at risk of going on sick leave and certified the participant’s sick leave. Only participants who provided signed informed consent were included in the study.
The study cohort was comprised of 626 participants (Table 1), of whom 325 were on sick leave pre-treatment and 145 were still on sick leave post-treatment.
The clinic operates a routine outcome monitoring system, in which questions concerning work status and the complete RTW-SE, Beck Depression Inventory, Second edition (BDI-II) and Beck Anxiety Inventory (BAI) are administered to patients pre- and post-treatment.
The primary diagnosis of the participants according to the ICD-10 criteria was current or recurrent depressive disorder (53.2%, n = 333), anxiety disorder (17.1%, n = 107), mixed anxiety and depression (12.1%, n = 76), or adjustment disorder (12%, n = 75); the remaining 5.6% (n = 35) of participants had another primary diagnosis, such as an eating disorder, hypochondria or sleeping disorder.
Translation and Wording of Items
To ensure linguistic and clinical expertise during the validation process, the RTW-SE scale was translated from English to Norwegian by an expert panel of clinical psychologists. The Norwegian version was then independently back-translated into English by three experienced clinical psychologists who are fluent in English. The original author of the scale assessed the English back-translations to confirm the quality of the translation.
We pre-tested the first translation on a group of approximately 10 patients. One item on the scale was reworded, as the first administration of the translated version showed that patients often misunderstood one negative question (“I will not be able to handle potential problems at work”). This item also had one of the lowest factor loadings in the original Dutch version . We therefore reworded this question to a positive statement (“I will be able to handle potential problems at work”). This change was investigated by comparing the question factor loading in the current study with the factor loading for the same question described in the original development and validation study by Lagerveld in 2010 . Rewording of this item in the current study improved the factor loading compared to the original scale (Table 2).
Return -to-Work Self-efficacy (RTW-SE)
RTW-SE  was measured using the previously described 11-item scale. Examples of the items are: “If I resume my work fully tomorrow in my current health situation. I expect that”; (1) “I will be able to perform my tasks at work”; (2) “I will be able to concentrate on my work”. As patients were on sick leave or working when they answered the scale, we did not refer to the scale as the RTW-SE when in contact with patients. The response categories vary from “totally disagree” to “totally agree”. The mean score for the 11 items was used to compute the total RTW-SE score. The RTW-SE scale yields a continuous score ranging from 1 to 6; higher scores indicate a higher return to work self-efficacy. The internal consistency of the scale in the first validation study was excellent over time and across subgroups, with Cronbach’s alpha coefficients larger than 0.80 .
Beck Depression Inventory—Second edition (BDI-II)
The BDI-II  is one of the most widely used self-reporting measures for estimating the presence and severity of the symptoms of depression during the previous two weeks. The scale contains 21 self-evaluated items that are rated on a 4-point Likert scale ranging from 0 to 3. The responses are summed to yield a score that ranges from 0 to a maximum of 63, with a higher score indicating a greater severity of depression in the last two weeks. The psychometric properties of the BDI-II are adequate . The recommended cut-off for minimal depression is 13, whereas scores of 14–19, 20–28, and 29–63 indicate mild, moderate, and severe depression, respectively. The Cronbach’s alpha coefficient of the BDI-II in the present study was 0.89.
Beck Anxiety Inventory (BAI)
The BAI  is a 21-item self-reported inventory for assessing the symptoms of anxiety during the previous week. The items are rated on a 4-point Likert-scale ranging from 0 to 3; the total score ranges from 0 to 63. The BAI has been found to be reliable and valid for measuring symptoms across different anxiety disorders . In the current study, the Cronbach’s alpha coefficient of the BAI was 0.90.
Return to Work
At pre-treatment, patients reported their work status on a self-reported questionnaire as fully working, on partial sick leave or on full sick leave.
Follow-up data on work status at 3, 6 and 12 months after treatment was derived from the National Social Insurance Register (NAV-registry), which ensured no loss to follow-up. The register includes information on whether each individual was on full or partial sick leave. Full return to work was defined as working 100% at the above-mentioned specific time points as registered in the NAV registry.
Data were analysed using STATA version 14.0. We evaluated internal reliability by calculating the Cronbach’s alpha values. The underlying factor structure of the RTW-SE scale was estimated by conducting an exploratory principal component analysis based on Kaiser’s rule of eigenvalues . The correlations between the RTW-SE, BDI-II and BAI were examined by calculating Pearson’s correlation coefficients. To explore construct validity, we examined if significant differences existed between the groups of participants on full sick leave, partial sick leave and full work using ANOVA analysis with a post hoc pairwise comparison. Logistic regression analyses were performed to study the predictive validity of the pre- and post-treatment RTW-SE scores with full return to work at post-treatment and 3, 6 or 12 month follow-up post-treatment as the dependent variable.
We constructed receiver operating characteristic (ROC) curves using the post-treatment RTW-SE score as a classifier and work-status (working fully, graded sick leave, full sick leave) as reference groups. ROC analysis is widely used to select appropriate clinically optimal cut-off scores by testing the ability of a scale to discriminate between groups [31, 32]. In order to determine the appropriate cut-off values for the return to work process, ROC analyses were performed on the post-treatment scores of the subgroup of patients on sick leave pre-treatment (n = 314). We estimated two post-treatment cut-off scores, as previous research suggested that return to work is not a single event, but rather a continuum reflecting a gradual process . Firstly, ROC analysis was used to estimate an upper cut-off score by using full work vs. sick leave (either graded or full) post-treatment as the reference variable. The second ROC analysis was used to estimate a lower cut-off score using graded sick leave vs. full sick leave as the reference variable among the subgroup of patients still on sick leave after treatment (n = 145). The accuracy of the ROC analysis was estimated from the area under the curve (AUC), which provides a summary measure of the sensitivity (true positives) and specificity (true negatives) of the test relative to the reference groups across the entire range of RTW-SE scores. In general, an AUC score of 0.5 is consistent with a screening tool that is no better than chance. A score of 1.0 indicates perfectly accurate discrimination, while an AUC between 0.7 and 0.8 is considered acceptable; 0.8‒0.9, excellent; and greater than 0.9, outstanding [31, 33]. The optimal cut-off values were identified using the Youden index (J), which calculates the scores with the highest combined sensitivity and specificity . The predictive validity of the post-treatment cut-off scores was examined in the sub-group who were still on sick leave post-treatment (n = 145) using a logistic regression model with full return to work at 3-, 6-, and 12-months follow-up as dependent variables. The group with scores below the lower cut-off was used as the reference category. Full return to work was coded 1 and partial or full sick leave was coded as 0. Missing data for individual items on the RTW-SE, BDI-II and BAI were replaced by weighted means . Effect sizes were calculated using Cohen’s d and pooled SD values .
This study qualified as health-service research and was therefore approved in advance by the Norwegian Data Protection Authority. Patients signed an informed consent form and could withdraw their consent at any time without providing an explanation. The study was conducted according to the principles of the Helsinki Declaration.