Introduction

Therapeutic homework in terms of inter-session activity presents a central component of psychotherapy and is particularly inherent to cognitive behavioral therapy (CBT; Beck et al. 1979). The core principle of this treatment is to equip patients with tools to change thoughts, behaviors, emotions, and their interplay. Homework may be defined as activities carried out between sessions in order to practice skills outside of therapy and to generalize to the natural environment (Kazantzis and L’Abate 2007; Lambert et al. 2007). Rather than exclusively discussing problems in an isolated setting, patients are encouraged to address the problem in their everyday life with the intention to produce and maintain a therapeutic effect (Lambert et al. 2007). The theorized mechanisms of the effect of homework build upon the skills-building approach of CBT (Beck et al. 1979; Detweiler and Whisman, 1999), as therapeutic exercises provide an opportunity for the patient to gather information and practice newly gained skills. Ultimately, practicing skills outside therapy helps becoming aware of the problem and consolidating new beliefs and behaviors (Beck et al. 1979). Homework thus serves as a means of transferring strategies outside the therapy context and enables the patient to practice new skills in real-life situations in order to maintain therapeutic gain (Kazantzis and Ronan 2006).

Homework is a commonly studied process variable in CBT and has empirically been investigated primarily in association with treatment outcome. Previous research has demonstrated that a high level of homework compliance is related to improvements in depressive symptoms (e.g., Kazantzis et al. 2010). Meta-analyses have established correlational evidence for the homework compliance and outcome relationship (e.g., Mausbach et al. 2010) as well as experimental evidence for the superiority of treatments that incorporate homework over treatments without homework (Kazantzis et al. 2010, 2016).

It has previously been noted that an “evidence-based” assessment of homework compliance (Dozois 2010, p. 158) requires the consideration of qualitative aspects of homework completion throughout the course of the treatment (Dozois 2010; Kazantzis et al. 2010, 2017). This has been neglected in previous studies on the homework-outcome relationship, which rely solely on adherence or compliance measures that focus on the proportion of completed homework or global single-item measures of whether the patient attempted the homework or not (e.g., Bryant et al. 1999; Aguilera et al. 2018). In a recent systematic review of homework adherence assessments in major depressive disorder (MDD), Kazantzis et al. (2017) found that only 2 out of 25 studies reported the measures that addressed the quality of homework completion. Furthermore, the single-item Assignment Compliance Rating Scale (ACRS; Primakoff et al. 1986) does not capture the depth of HE and the Homework Rating Scale (HRS; Kazantzis et al. 2004) is a client self-report measure, which might over- or underestimate homework compliance compared to objective measures. Studies increasingly put effort on focusing on qualitative aspects of homework completion. For this reason, the term and concept of homework engagement (HE) has been deemed relevant: it refers to the extent to which a patient has completed homework in an elaborate and clinically meaningful manner (Dozois 2010; Conklin and Strunk 2015). Furthermore, less empirical attention has been paid to underlying mechanisms going beyond patient factors, including therapist behaviors influencing HE and their relation to depressive symptoms.

Homework-Related Therapist Behaviors

Theoretical considerations and clinical recommendations of therapist behaviors related to homework (TBH) mainly build on four strategies suggested by Beck et al. (1979): (1) Homework should be described clearly and should be specific; (2) homework should be assigned with a cogent rationale; (3) patients’ reactions and should be elicited and in order to troubleshoot difficulties; (4) progress should be summarized when reviewing homework. Expert clinicians have also pointed out the value of formulating simple and feasible homework tasks and emphasized the patient involvement when developing homework assignments that are agreeable to the patient (Kazantzis et al. 2003; Tompkins 2002). Moreover, factors such as the match between the assignment and the client, as well as the wording of the homework task should be considered (Detweiler and Whisman 1999).

The suggested domains have also received some empirical attention. To our knowledge, four studies have focused on TBH in face-to-face treatment of MDD, which provide inconsistent findings. First, Startup and Edmonds (1994) investigated whether patient ratings of therapist behaviors promoting homework compliance were associated with therapist-rated homework compliance in a sample of 25 patients. The results did not demonstrate a significant relation between any facet of TBH (providing rationale, clear description, anticipation of problems, involving the patient) and homework compliance, which was largely attributed to ceiling effects of the patients’ ratings of TBH. Second, Bryant et al. (1999) assessed observer-rated homework compliance and TBH (reviewing previous assignment, providing rationale, clearly assigning and tailoring, seeking reactions and troubleshooting problems) in 26 depressed patients receiving cognitive therapy (CT). The study confirmed that patients that are more compliant experienced greater symptom improvement, and demonstrated a non-significant trend that suggests a relation between the overall score of the therapist homework behavior scale and homework compliance. Item-based analyses, however, demonstrated that therapist reviewing (TBH-R), but not therapist assigning behavior (TBH-A), was related to homework compliance. Third, in a sample of adolescents with depression, Jungbluth and Shirk (2013) demonstrated that providing a strong rationale and allocating more time in the beginning of treatment predicted greater homework compliance in the subsequent session, especially for initially resistant individuals. Fourth, the most recent study, conducted by Conklin et al. (2018), evaluated three classes of TBH in a sample of 66 patients with MDD undergoing CT. The authors reported that TBH-A, but not TBH-R were predictive of HE in the early sessions of CT, which stands in contrast to the findings of Bryant et al. (1999).

In consideration of the therapist’s prominent role in making use of therapeutic homework and the available inconclusive findings, the contribution of TBH to HE and their relation to depressive symptoms needs further exploration.

Homework Engagement in Telephone-Based CBT

The introduction of low-intensity CBT led to a way of delivering evidence-based treatments that is characterized by limited therapist input, technology-support, and increased use of self-help. These features are conflated in telephone-based CBT (tel-CBT). Tel-CBT puts emphasis on patients’ independent engagement with the therapeutic contents outside of therapy sessions by making systematic use of homework activities. The therapist plays an active role in structuring the treatment, providing input, and facilitating the comprehension and the use of homework. To the best of the authors’ knowledge, a limited number of studies with regard to homework in guided self-help and technology-supported treatment exists. One study investigating overall and component-specific homework compliance in an internet-based treatment with minimal therapist guidance found that overall homework compliance predicted 15% of the reductions in depressive symptoms (Kraepelien et al. 2019). Another study investigated TBH-R and homework completion in a telephone-delivered CBT (Aguilera et al. 2018). The authors found that the number of sessions in which a patient completed homework was related to a decrease in depressive symptoms at the end of treatment. This relationship disappeared when taking into account TBH-R, which, however, was positively associated with symptom reduction. These findings suggest that aspects of TBH are important factors for improved symptom outcome, but that TBH does not moderate the effect of homework compliance on improved symptom outcome (Aguilera et al. 2018).

Given the emphasis on patients’ contribution and self-reliance in the present treatment format, the assessment of HE might be a relevant process variable related to treatment outcome and an important therapy process that therapists can build upon. We would like to extend the current literature by using HE—a construct that is conceptually different from homework compliance and adherence—and by evaluating all sessions of the treatment (on average 9 sessions). This allows gaining a deeper understanding of the course of HE and TBH as well as the potential association between these variables and depressive symptoms.

Aim of the Current Study

The overall aim of the study is to provide insight into the occurrence and the course of HE and TBH in tel-CBT for depression. Additionally, first evidence on the relationship between HE, TBH, and depressive symptoms should be provided. Three objectives are pursued: (1) The assessment of the amount of homework, the proportion of different homework types, and the types of difficulties faced by patients when engaging with homework; (2) the description of initial status and course of HE and TBH in tel-CBT; (3) first examination of the relation between HE, TBH, and depressive symptoms over the course of the treatment.

Methods

Patients

The current study draws on data from a randomized controlled trial (RCT; Haller et al. 2019) investigating the effectiveness of tel-CBT compared to treatment as usual. Information on detailed study procedures and methods of the overarching RCT can be found in the study protocol (Watzke et al. 2017). The trial was approved by the local Ethics Committee. Inclusion criteria for the study were a PHQ-9 score of > 5 and ≤ 15, a diagnosis of mild or moderate depression according to ICD-10 (F32.0, F32.1, F33.0, F33.1), and the provision of a written informed consent. Patients were excluded, if they showed suicidality (item 9 > 0 on PHQ-9) or severe or chronic depression (F32.2, F34.1), if their physical or mental condition did not allow completion of questionnaires, if they were not proficient in the German language, or if they were in psychotherapeutic or psychological treatment at the time of intake or 3 months prior. For the main trial, 152 patients were screened for eligibility, of which 54 were included and randomized to either intervention or control group.

Data of each therapy session from patients randomized in the intervention group, i.e., those who received and completed the tel-CBT (N = 24), were used. We included data from all patients of which more than 80% of the therapy sessions were available and audio-recorded. The sample for the current study was necessarily reduced to N = 22 because from two patients the majority of therapy sessions was missing due to technical failure to record. The two excluded patients did not differ from the intervention group in clinical status and sociodemographic variables with the exception that their age is in the lower range.

Therapists

For the included 22 patients, three therapists who were employed at the University’s outpatient clinic were involved in providing tel-CBT. All therapists were female and 34 years old on average (SD = 5.9). The therapists were clinical psychologists with previous experience in treating patients with depression, and were in advanced training of CBT (current duration of training: M = 4.3 years, SD = 1.5). They received specific training in tel-CBT prior to the study and regular supervision by a senior clinician and researcher (BW) during the treatment provision.

Treatment

Tel-CBT starts with a personal face-to-face session with the therapist and comprises 8–12 subsequent telephone sessions, which last between 30 and 40 min. The treatment program is called “Creating a balance” and is conceptualized as a guided self-help CBT delivered over the telephone. The content is based on core CBT elements—psychoeducation, behavioral activation, cognitive restructuring, and relapse prevention—within a total of eight chapters. The intervention entails a treatment manual for therapists and a workbook for patients to read and practice skills in between sessions. Each chapter is structured in a psychoeducational part with reading materials and case vignettes and a practical part with step-by-step instructions for exercises (i.e., homework). Copies of additional worksheets to complete homework are provided at the end of each chapter. Therapists were instructed to adhere closely to the treatment manual. This included agreeing upon a homework assignment in each therapy session, and reviewing the previously assigned homework at the beginning of the subsequent therapy session. The types of homework in the treatment manual were classified as: (1) Psychoeducational homework, including reading materials and case vignettes; (2) behavioral homework, including scheduling and undertaking pleasant activities; (3) cognitive homework, including replacing dysfunctional thoughts; (4) self-monitoring homework, referring to observing and monitoring thoughts and emotions; and (5) relapse prevention homework, including recognizing warning signs and establishing an emergency plan.

Measures and Assessment

Global Homework Engagement Scale (GHES). We developed an instrument measuring global HE independent of the type of homework assigned. The previously established homework engagement scale (HES) for CT by Conklin and Strunk (2015) served as a basis for the instrument. GHES consists of seven items regarding quantitative and qualitative aspects of homework completion. Each item is described in detail and is assessed on a 5-point Likert scale, varying from 0 (not at all) to 4 (considerably). Each of the five item manifestations contains a verbal anchoring tailored to the respective item in order to determine specific criteria connected to the rater’s decision, helping to ensure a uniform understanding of each item’s characteristics. The seven items cover the following aspects of HE: (1) Extent to which patients engaged with homework tasks; (2) whether and to which extent patients carried out homework as agreed upon; (3) whether and to which extent patients applied learnt strategies in difficult times; (4) the intensity of HE; (5) whether and to which extent patients faced difficulties when carrying out homework; (6) whether and to which extent patients could benefit from completed homework tasks; (7) estimated time that patients spent on HE. Additionally, and similarly to HES by Conklin and Strunk (2015), the scale contains two items which serve as a homework log. In the first log-item, homework that was reportedly completed from the previous session were written down by the raters. For the second log-item, research assistants recorded homework assignments for the next session before the rating procedure started. This procedure ensured that raters were informed about which previously assigned homework the discussion in a session is referring to. For the global GHES score, an average score of items 1 to 7 is calculated with higher scores indicating more HE.

Scale for Therapeutic Homework Assignment and Review (StHAR). An instrument to assess TBH was constructed for the purpose of this study. The instrument consists of eight items covering the process of assigning the upcoming homework (TBH-A) and the process of reviewing previously assigned homework (TBH-R). All items are assessed on a 5-point Likert scale, varying from 0 (not at all) to 4 (considerably). Each item is described in detail and contains a verbal anchoring for each item manifestation. The five items covering TBH-A build the subscale StH-A and comprise: (1) providing a rationale for the homework; (2) tailoring the homework to the individual situation; (3) addressing potential challenges of completing the homework; (4) specifying the homework; (5) ensuring comprehension of the homework. The subscale StH-R includes three items relating to TBH-R: (1) extent of discussing previous homework; (2) drawing conclusions of the homework; and (3) using homework to strengthen self-efficacy expectation of patient. The global StHAR score is calculated with an average score of all items, with StH-A items used from the previous session and StH-R items used from the subsequent session. Higher scores indicate a larger extent of TBH. Items from both scales are displayed in Table 1. The German versions of the scales can be retrieved upon request from the corresponding author.

Table 1 Means and standard deviations for items and total score of GHES and StHAR measures across all sessions

Patient Health Questionnaire (PHQ-9). Depressive symptoms were assessed at the beginning of each session using the German version of the PHQ-9 (Löwe et al. 2002). Nine items regarding primary and secondary depression symptoms are assessed on a 4-point Likert scale and build a sum score between 0 and 27. Therapists went through each item of the PHQ-9 right at the beginning of each session as part of the symptom monitoring. Patients had a copy of the PHQ-9 in front of them, answering whether the symptom was available 0 (none of the days) to 3 (almost every day). Although originally developed as a self-report measure, telephone administration of the PHQ-9 seems to be a reliable and valid procedure to assess depression (Pinto-Meza et al. 2005).

Procedure

Ratings of Tel-CBT Sessions

Audio recordings were available for all therapy sessions of the included 22 treatments. All available recordings of per protocol therapy sessions were included in the dataset. We did not include the initial face-to-face appointment, as this was not relevant for the assessed process variables. From 210 tel-CBT sessions that had taken place within this sample, we were able to rate 194 sessions (92.4%). We had to exclude sessions that deviated from the treatment manual (n = 4) or where audio recordings were not available or unusable due to technical failure to record the session, or due to poor quality of the recording (n = 12), respectively. Deviation of the treatment manual is defined as a session that did not target the planned content. This was the case, when therapists had to react to a crisis situation of the patient. The mean duration of one telephone session was 43 min (SD = 9.6).

Raters and Rater Training

HE and TBH were rated by five independent raters (one Doctoral candidate and four Master-level students in clinical psychology). All raters were blind to treatment outcome of the patients. During a period of 4 weeks, raters received 54 hours of training in the employed treatment manual and the use of the rating instruments. Training consisted of discussing the content of the treatment manual, particularly homework types in the tel-CBT. Furthermore, defining adequate and competent therapist behaviors regarding assignment and review of homework were discussed. Following the training phase, three successive trial ratings were completed by the raters. Each trial rating was discussed and in case of disagreement, the wording of the items were refined until consensus was reached. Prior to the rating phase, three therapy sessions from two excluded cases were randomly selected and rated by all five raters in order to examine initial inter-rater reliability (IRR). Calculation of intra-class correlation coefficients (ICC) in a two-way random model ICC(2,2) (Shrout and Fleiss, 1979) revealed an average ICC(2,2) of .91 and a median ICC(2,2) of .93 across all raters and all items of GHES, and an average ICC(2,2) of .81 and a median ICC(2,2) of .88 across all raters and all items of StHAR. This result indicated that IRR was high, and that formal ratings could start subsequently.

Rating Procedure

All items were rated on a 5-point Likert scale in order to determine the estimated extent of patient`s HE as well as the extent of TBH. Raters were encouraged to take notes while listening to the audio file and rate all items at the end of the session. Of the 197 eligible audio recordings, each rater was randomly assigned between 32 and 38 sessions for the main rating. Session allocation was stratified by therapist, patient, and treatment phase (phase I: sessions 1–4; phase II: sessions 5–9). A subsample of therapy sessions was double-coded in order to establish IRR. 40% of the total amount of sessions were drawn to carry out double-ratings resulting in a total of 57 to 62 sessions rated per rater. Each rater was paired with every other rater an approximately equal number of times. For the double-rated sessions, the average score of the rater pair for each item was used in the final analyses.

Statistical Analysis

As GHES and StHAR are newly developed rating instruments, analyses of the psychometric properties were conducted before turning to the research questions under investigation. We calculated Pearson`s r for corrected item-total-correlations and coefficient omega (ω) to measure internal consistency of both scales. IRR was assessed by calculating ICC in a two-way random model (ICC2,2) (Shrout and Fleiss 1979) testing for absolute agreement between two raters and within one rater, respectively.

In order to meet research objective one, the types of homework assigned as well as types of difficulties faced when completing homework are reported. Moreover, descriptive statistics (means and standard deviations) of the individual items and the total scores of the scales GHES and StHAR (including subscales StH-A and StH-R) are presented. For research objective two, multilevel mixed models (MLM) were applied to examine between- and within-patient variability of HE and TBH over the course of treatment in a nested data set. In two-level models HE and TBH assessed at each of the nine telephone sessions (level 1) are modelled within each of the 22 individuals (level 2). The inter-individual variability in terms of initial status and growth of HE and TBH are modelled at level 2. For research objective three, MLM was analysed with depressive symptoms measured with PHQ-9 defined as criterion on level 1. Depressive symptoms were assessed in each session. HE of the same session, and TBH (consisting of TBH-A of the previous session and TBH-R of the current session), were gradually introduced as time-varying predictors of the session-specific symptom severity. In total, five stepwise built multilevel models were calculated. First, the null or unconditional model was created, including the intercept and the random term (null-model). Second, the null-model was expanded by adding a random slope for time (model 1). Third, one time-varying predictor (HE) was introduced into the random intercept random slope model (model 2). Lastly, random intercept and random slope models with two time-varying predictors (HE and TBH; model 3) and an interaction term between HE and TB (model 4) were created. A separate model that included HE as criterion and TBH as predictor was analysed.

All models were estimated using restricted maximum likelihood (RML). In order to compare the appropriateness of the specified models, AIC, BIC and log-likelihood values were used. Analyses were performed using R software (version 6.3.0; R Core Team 2014), the lme4 package (Bates et al. 2015) and the psych package (Revelle, 2019).

Results

Descriptive Statistics of Sample

Baseline sociodemographic and clinical characteristics of the N = 22 included patients are displayed in Table 2. The majority of the sample was female and on average 56 years old (SD = 18.1). Symptom severity ranged from mild to moderately severe levels of depression (6 ≤ PHQ-9 ≤ 20) at the beginning of treatment resulting in a moderately depressed status on average.

Table 2 Baseline sociodemographic and clinical characteristics in the study sample

Psychometric Properties of GHES and StHAR

With regard to psychometric properties of the scales, corrected item-total correlations ranged from .46 to .78 for GHES and from .39 to .61 for StHAR. Internal consistency of GHES was excellent across treatment (ω = .87), with values ranging from .79 to .91 across sessions. Internal consistency for StHAR was good across treatment (ω = .80) with values ranging from .63 to .87 across sessions. Internal consistency for StH-A was .73 and .68 for StH-R. We calculated ICC using a two-way random effects model (ICC2,2) (Shrout and Fleiss, 1979) to estimate IRR. For GHES, ICCs(2,2) across all rater dyads ranged from .41 to .81, resulting in a moderate average ICC(2,2) of .68 as well as a moderate median ICC(2,2) of .70. For StHAR, ICCs(2,2) across rater dyads ranged between .45 and .83 resulting in a moderate average ICC of .64 and a moderate median IRR of .64. Due to the good psychometric properties of StHAR, the global StHAR score was used instead of the subscales StH-A and StH-R in further analyses.

Descriptive Statistics of Homework, HE, and TBH

Across all telephone sessions and patients, 411 homework activities were assigned in total, resulting in approximately two defined homework tasks per session and per patient on average. The majority of the homework was classified as psychoeducational (n = 142; 35%) and behavioral (n = 138; 31%), followed by cognitive (n = 76; 18%), self-monitoring (n = 36; 9%), and relapse prevention (n = 29; 7%) homework. In total, 380 (92.5%) of the homework activities were completed. Across all patients and therapy sessions HE was on average M = 2.71 (SD = 0.74), which translates into moderate to high HE when using the item anchors. Difficulties in completing homework assignments were reported in 75% of the sessions, with the extent of difficulties showing an average of M = 1.53 (SD = 1.10). Using the item anchors, this value translates to small to moderate difficulties. Most commonly assessed types of difficulties encountered by patients were negative events that impeded homework completion (34.1%), depressive symptoms (29.7%), and lack of strategies and options to complete homework (13.7%). Lack of time (8.2%), homework being too difficult (8.2%), and other homework-related aspects (6.0%) were further reported difficulties in completing the task. HE and TBH showed a small significant association across sessions, with a mean correlation of r = .28 (p < .05). Descriptive information on HE and TBH per session are presented in Table 3.

Table 3 Means and standard deviations for total scores of HE and TBH measures in each session

Course of HE and TBH and Their Association

With regard to variation in HE among patients and across treatment, we first ran an unconditional or null model with HE as criterion. The average HE across patients and treatment is 2.70 (SE = 0.09). Calculations of ICC using the within- and between-patient variance shows that 25% of the variance in initial status of HE are attributed to differences among patients. Entering time as predictor (model 1), the unconditional growth model demonstrates that patients start on average with high HE (M = 3.00, SE = 0.13) and show a small reduction in HE during the course of treatment (− 0.05, p = .011). With regard to TBH, 14.8% of variance can be attributed to differences between patients. The initial status of TBH is 2.32 (SE = 0.13) and shows a similarly small, but statistically non-significant reduction during the course of the treatment (− 0.04, p = .307). The models regarding course of HE and TBH are displayed in Table 4.

Table 4 Unconditional growth model for changes in HE and TBH across treatment

In order to explore the association between HE and TBH, stepwise multilevel models were built with HE as criterion in a separate model. TBH consisting of TBH-A from the previous session and TBH-R from the following session was entered as a time-varying predictor of HE in the subsequent session. TBH was significantly and positively related to HE over the course of treatment (0.24, SE = 0.07, p = .032). Results are displayed in Table 5.

Table 5 Random Intercept and random slope model for association between HE and TBH

Association Between HE, TBH, and Depressive Symptoms

For the association between HE, TBH, and depressive symptoms, we first ran an unconditional or null model, which demonstrated a within-patient variability in depressive symptoms of 38% (data not shown), indicating a nested structure of the data. After modelling the time slope (model 1), time-varying predictor 1 was entered at level 1 (model 2). Time-varying predictor 1 was HE of the current session, since ratings refer to the interval between two sessions. Higher scores on HE were associated with lower depressive symptoms over the course of treatment (− 0.83, SD = 0.35, p = .015). Comparison of model 1 and model 2 returned better fit indices for model 2 (log-likelihood for model 1 = - 451.37 and for model 2 = − 448.05, p = .009; AIC for model 1 = 910.74 and for model 2 = 906.10; BIC for model 1 = 923.3 and for model 2 = 921.8;) for the random intercept random slope model with HE as predictor (smaller values indicate better fit). Next, the second time-varying predictor—TBH from the previous session—was introduced into the model at level 1. TBH was not significantly related to depressive symptoms (0.23, SD = 0.30, p = .437). Compared to model 2, model 3 did not show improved model fit (log-likelihood for model 2 = − 444.69 and for model 3 = − 444.24, p = .346; AIC for model 2 = 903.4, and for model 3 = 904.5; BIC for model 2 = 925.4 and for model 3 = 929.6), indicating the model with HE as predictor fits the data better. The last model (model 4) included an interaction between the two time-varying predictors, however the model did not converge. Results of the random intercept model (model 1), the random intercept and random slope model with one predictor (model 2), and the random intercept random slope model with two predictors (model 3) are presented in Table 6.

Table 6 Multilevel mixed models with random intercept and random intercept and slope

Discussion

The present study describes types and amount of homework assigned and depicts rather high levels of HE in tel-CBT. Results of our study further show that HE decreases slightly throughout the course of therapy and that TBH is related to HE over the course of therapy. Ultimately, results reveal that higher scores on HE are associated with lower levels of depressive symptoms, but that TBH and depressive symptoms are not associated.

The study demonstrates that homework assignments and engagement with homework play a central role in tel-CBT – as could be expected from the guided self-help approach. This is indicated by the overall amount of assigned homework across therapy and patients, the proportion of homework completed by patients, and the patients’ rather high HE throughout the course of the treatment. As expected, we found that homework was overall assigned in most of the therapy sessions. The fact that on average two homework assignments were prepared in each session confirms that contents were employed and implemented as scheduled by tel-CBT. This treatment format lays special emphasis on this kind of intersession activity.

When modelling the status and course of HE and TBH, both variables showed more within-patient variability compared to between-patient variability over the course of the treatment, as indicated by the ICC calculations of variance components and the slopes of the variables in the models. Inter-individual differences explained rather small proportions of the variance (25% in HE, 15% in TBH), which might indicate that both variables are dynamic rather than stable patient characteristics. The overall high HE across patients might be explained by sociodemographic and clinical patient characteristics. The average age of our sample was rather high and the vast majority of patients reported having had previous depressive episodes and psychotherapy experience. It is likely that patients with a history of depression and of undergoing treatment are trying particularly hard to make the most out of therapy. Moreover, older patients might show a sense of self-responsibility when it comes to carrying out therapeutic homework. Contrary to the belief that adult patients may have reservations regarding homework due to their age, there is evidence that adult patients have positive attitudes towards homework, with the vast majority of patients not perceiving themselves too old for homework (Fehm and Mrose 2008). HE declined slightly over the course of treatment and visual inspection of the individual courses of HE showed that drops in HE happened in some patients in single sessions. These variations are expected to be due to specific external factors that have an influence on the patient's HE at a given session. For example, further explorative analyses might scrutinize which external factors regarding homework (such as difficulties completing the homework task; lack of resources or time in a given week) and session content might be responsible for situations with a drop in HE. In view of previous suggestions that homework compliance might not be linear across treatment of social anxiety disorder (Leung and Heimberg 1996), future studies might employ statistical models that are suitable to detect various patterns of HE. For example, latent growth analysis, which requires much larger samples than the one used in our study, would allow to detect differences in latent factors between groups of patients, and to relate different HE patterns to treatment outcome (Collins and Sayer 2001).

Our study provides empirical support for the association between HE and depressive symptoms throughout the course of tel-CBT in mildly to moderately depressed patients. Using MLM with repeated measures of predictors and outcome, we found a medium-sized association between HE shown between sessions and depressive symptoms in the subsequent session. In other words, when HE increases by one unit in an interval of two sessions, patient's symptomatology decreases an average of 0.8 units on the PHQ-9 in the subsequent session. Overall, this result goes in line with meta-analytic evidence of the relation between homework compliance and treatment outcome showing a weighted mean effect size on therapy outcome of r = .22 for homework compliance and r = .36 for the employment of homework in therapy (Kazantzis et al. 2000). Moreover, the result corresponds to one previous study focusing on a similar conceptualization of HE, which found an immediate effect of HE on symptom outcome in the subsequent session (Conklin and Strunk 2015). In our study, TBH was not associated with depressive symptoms in the subsequent session. However, our results indicate that TBH was significantly related to HE over the course of treatment, which corresponds to results of a previous study that found TBH to significantly predict subsequent HE (Conklin et al. 2018). Explanations for these findings could be that some clinically beneficial TBH might have been less present in the overall therapists’ behaviors and therefore exerted an effect on HE but not on depressive symptoms. Even though the homework procedure in our study tended to be therapist-initiated, the patients took an active part in tel-CBT, as the majority of the session time was spent on reviewing patients’ experiences with the previous homework and discussing future homework It needs to be stressed that therapists were not trained in specific assignment and review procedures. This means that some aspects of assigning homework that received clinical and empirical support in previous work, were not implemented in our study. For example, it is recommended to write down homework tasks and instructions (Cox et al. 1988) in order to assure higher homework compliance. Moreover, a recent study provides preliminary support for the importance of designing homework tasks that are congruent with what the patient perceived helpful in the session (Jensen et al. 2020). Since therapists were instructed to adhere to the homework assignments as scheduled, they were not entirely free to consider whether the homework type scheduled for a specific session was appropriate for the patients’ current problem or situation. It is likely that therapists—despite strictly assigning the activity types as scheduled in the treatment manual—adequately adapted the different homework types to the patient's individual situation and promoted patient's willingness and ability to engage with homework outside the therapy session. Our results further suggest that the specific type of homework might not be the only relevant factor for higher HE, as long as therapists assign and review homework in an elaborate, comprehensible, and convincing manner. Lastly, it is important to consider that the association between TBH and HE might run in the opposite direction in that patients’ higher HE and reporting thereof might have influenced the therapists’ reactions to the patients’ reports.

The present results need to be interpreted in due consideration of several limitations: First, the predictor variables were assessed using two self-constructed rating scales, which have not been validated prior to the study. We did not use standardized or validated instruments to assess HE and TBH, because no process rating instrument targeting the particular conceptualization of these variables exists. We aimed at expanding on the previously reported Homework Engagement Scale (HES) by Conklin and Strunk (2015) by adding indicators such as intensity of HE or difficulties faced when engaging with homework. Despite good psychometric properties for both scales with regard to internal consistency and moderate to good properties regarding IRR, the validity of GHES might be constrained: Even though GHES is an objective observer-based rating instrument with a precise rating manual, the items do not always allow a direct observation of facets relevant to HE. The appraisal of each item relies on the patient expressing his or her thoughts and experiences with the homework process. However, these narratives might not cover all areas of interest in the rating instrument. For example, the rating on the difficulty-item is indirectly inferred from the narratives of the patient about how engaging with homework went. If the patient did in fact face difficulties affecting HE, but not explicitly mention these when talking about how homework activity went, the measurement of difficulties faced in this situation might not be representative of HE. The rating therefore relates to the raters’ appraisal of whether a patient had faced challenges that might have affected HE, rather than the patients’ subjective feelings or the true influence of experienced difficulties on HE. Objective and observer-based assessments of HE might be supplemented by patients’ reports of difficulties faced as well as by patient ratings on the profoundness with which patients engaged in homework activities as well as the perceived benefits of homework in future research. Second, the StHAR did not specifically target competence or quality of assigning and reviewing homework. Future studies might develop and employ rating instruments that clearly differentiate the extent of TBH shown by the therapist from the competency of these therapeutic actions. Moreover, patient ratings of whether therapists assigned and reviewed the homework in a skilful manner in the patients’ views might add to a better understanding of clinically meaningful TBH.

Third, our methodology and our analytic strategy do not allow for any causal inferences regarding HE and depressive symptoms, despite multiple assessments of HE in session intervals and the depressive symptoms assessed at the beginning of each session. Reverse causation cannot be excluded, since patients might have reported about homework more elaborately and positively in the sessions due to an improved mood. Moreover, depressive symptoms were assessed retrospectively for the time period since the last therapy session. Fourth, the study sample was rather small. Therefore, additional exploratory statistical models for our third research question (e.g., including interaction terms) could not be converged in our models. Lastly, selection bias might have occurred as the majority of the patients self-referred to the overarching clinical trial, potentially leading to the inclusion of generally motivated patients who showed rather small variability in HE and therefore also did not require the therapist to intervene in a way that promotes HE or improves depressive symptoms.

Even though our results should be regarded as preliminary evidence, the findings add to the body of literature due to several strengths. A more comprehensive concept of the extent of homework compliance was used in the present study, going beyond commonly used quantitative measures of homework completion or single-item compliance measures. Several differences between HE and previous operationalizations of homework compliance exist. HE incorporates facets of the quality and the intensity of patient's engagement with the homework tasks, the estimated benefit for the patient of undertaking homework, the estimated transference of acquired skills to the patients’ daily lives, as well as the difficulties experienced by the patient when completing homework. Another strength of the study is the conceptualization of TBH, which incorporates multiple facets regarding preparing and reviewing homework, informed by clinical recommendations. These aspects were derived from listening to and rating complete therapy sessions with high reliability, as indicated by the IRR analyses. Moreover, observer-based ratings of both HE and TBH might provide more objective estimations of HE and discussion of tasks in the therapy session compared to client or therapist reports (Mausbach et al. 2010). Lastly, our study provides insight into the course of HE and TBH throughout the entire treatment, which helps generating hypotheses regarding the nature of HE and its relation to TBH and depressive symptoms.

Conclusions

The study provides evidence that homework is implemented by therapists and patients in tel-CBT. Engagement with homework and therapists’ actions to assign and discuss homework varies across treatment in this sample. However, on average a slight decrease of HE throughout the treatment was observed and patients, who show high HE, experience lower depressive symptoms on average. Future studies with designs allowing to determine the direction of causality and with  reliable and more economic ways of retrieving information regarding HE in the patients’ natural environments (e.g., using ecological momentary assessment) are warranted. This approach would allow for recording patients’ HE close to occurrence and provide information regarding reasons for low HE as well as facilitators for completing homework without recall bias. TBH was not related to depressive symptoms but showed an association with HE. Future studies might examine whether TBH moderates the HE-symptom improvement relationship and whether specific homework types require specific therapist skills to assign and review in a meaningful way.