The prevalence of mental health disorders among undergraduate students increased from 22 to 36% between 2007 and 2017 (Lipson et al., 2018). Thirty-five percent of first-year college students worldwide met criteria for at least one lifetime psychological disorder (Auerbach et al., 2018). In 2020, 35% and 39% of undergraduates met criteria for major depressive disorder and generalized anxiety disorder, respectively, and suicide is identified as one of the leading causes of death in this population (Chirikov et al., 2020; Turner et al., 2013). The proportion of students seeking mental health services also increased substantially, from 19 to 34%, between 2007 and 2017 (Lipson et al., 2018). The COVID-19 global pandemic has only exacerbated mental health concerns among university students, increasing loneliness and isolation and significantly affecting quality of life (Chirikov et al., 2020; Lederer et al., 2021). Thus, there is an urgent need to consider additional resources and interventions for addressing and improving mental health in this population.

Research has shown that 25% of undergraduates endorse clinical levels of shame (Andrews et al., 2002; Cook, 1996). Shame is a highly painful emotion reflecting negative judgment, disapproval, or rejection of core aspects of the self (Cândea & Szentágotai-Tăta, 2018). Shame can be internal, in which the self is both the judge and the object of judgment, or external, in which the judge is the other as seen through one’s own eyes (Gilbert, 2000; Matos et al., 2013). Elevated levels of shame have been observed across numerous psychological disorders, including anxiety disorders (e.g., Fergus et al., 2010), depression (e.g., Mills et al., 2015), post-traumatic stress disorder (e.g., Feiring et al., 2002), eating disorders (e.g., Kelly & Carter, 2013), and personality disorders (e.g., Ritter et al., 2013). Some data have also suggested that shame is related to poorer psychotherapy outcome (Wiltink et al., 2016). The impairing, transdiagnostic nature of shame makes it an important target for intervention.

Shame and self-compassion have evidenced a significant, inverse relationship and fostering self-compassion has been associated with decreased shame (Kelly & Waring, 2018; Matos et al., 2017; Reilly et al., 2014; Woods & Proeve, 2014). Compassion is understood as a sensitivity to the suffering of self and others, with a deep commitment to try to relieve it (Lama, 2001). Compassion has three flows: compassion flowing out to others, compassion flowing in from others, and self-compassion (i.e., compassion flowing in from the self). Self-compassion is considered to be an antidote for shame, as it involves the courage to attune to one’s own suffering accompanied by the wisdom to act in ways that may be helpful in moments of pain (Gilbert, 2010). Directing compassion toward oneself in moments of pain or struggle provides an opportunity for individuals to acknowledge and validate their experience, rather than criticize or punish themselves, which can prolong suffering. Self-compassion, as an alternative to self-judgment or self-disparagement, allows individuals the space to consider what might be helpful during a period of struggle, which is an adaptive alternative to rumination and self-criticism.

Two well-known approaches to foster greater self-compassion are Compassion-Focused Therapy (CFT; Gilbert, 2010, 2014) and Mindful Self-Compassion (MSC; Neff, 2003a). CFT focuses on helping individuals learn to regulate their emotions, establish safeness within themselves, and increase the warmth, care, and kindness with which they relate to themselves (Gilbert, 2010). MSC helps individuals foster greater self-compassion through mindfulness, self-kindness, and the concept of common humanity, which highlights the ways in which human beings can relate to one another through the common experience of pain (Neff, 2003a). Established mindfulness, self-compassion, and acceptance-based interventions also focus on factors including the mind–body relationship, fostering non-judgment and acceptance of experience, and affect regulation through self-soothing (for a meta-analysis of self-compassion-focused interventions, see Ferrari et al., 2019).

Interventions that focus on fostering self-compassion demonstrated increases in self-compassion and compassion for others, self-reassurance, and self-soothing, as well as reductions in shame, self-criticism, anxiety, depression, stress, perceived inferiority, and submissive behavior (Cuppage et al., 2018; Gilbert & Procter, 2006; Judge et al., 2012; Matos et al., 2017). Importantly, some evidence suggested that these interventions may have lasting effects (Cuppage et al., 2018). One exercise that can be used to enhance self-compassion is self-compassionate writing, and existing studies examining the efficacy of short-term self-compassionate writing practice showed promising results (Johnson & O’Brien, 2013; Kelly & Waring, 2018; Stern & Engeln, 2018; Wong & Mak, 2016). Compared to control conditions, self-compassionate writing was associated with increased body satisfaction and positive affect in undergraduate women (Stern & Engeln, 2018), decreases in bodily shame and increases in self-compassion in women with anorexia nervosa (Kelly & Waring, 2018), and lower levels of negative affect and state shame in university students (Johnson & O’Brien, 2013). Given that only 25% of undergraduate students indicated that they would seek treatment for an emotional problem, with the majority of students stating that they would prefer to address such difficulties on their own (Ebert et al., 2019), it is crucial to identify mental health interventions that are not only effective but that students feel comfortable utilizing. Self-compassionate letter writing can be done alone and practiced when needed and thus may appeal to students, particularly for those whose shame interferes with seeking treatment.

Given the importance of finding additional ways to improve mental health among undergraduate students, the current study sought to examine the helpfulness of a 2-week self-compassionate letter-writing intervention for undergraduate students with high levels of shame. In addition to examining effects of the intervention following the two-week self-compassionate practice, we included a one-month follow-up assessment to determine whether any therapeutic effects would be maintained following cessation of the intervention, which would support initial findings of longer-term gains (Cuppage et al., 2018). We hypothesized that those who practiced self-compassionate letter writing would experience greater decreases in global and external shame, self-criticism, general anxiety, and depression, and greater increases in self-compassion than those in the waitlist control group and that these gains would be maintained at follow-up.

Method

Participants

A power analysis for a repeated-measures between-subjects analysis of variance (ANOVA) design conducted in G*Power (Faul et al., 2009) yielded a total target sample size of 62 to detect a medium effect size with 80% power. Given that there is no gold standard power analysis method for multilevel modeling, and previous studies demonstrated small to medium increases in self-compassion and medium to large reductions in shame in samples of 40–90 (e.g., Johnson & O’Brien, 2013; Kelly & Waring, 2018), we aimed for a total target enrollment of 60 participants, 30 per group.

The final sample for this study comprised 68 undergraduate students recruited through the university’s online research system, flyers posted around campus, and departmental e-mail listservs. Inclusion criteria were being 18 years of age or older, self-reported English fluency in speaking, reading, and writing, and baseline scores ≥ 65 on the Experience of Shame Scale (ESS; Andrews et al., 2002). Though there is no established cut-off score on the ESS to indicate clinically meaningful shame, research in an English-speaking undergraduate sample indicated that the mean ESS score was 55.58 (SD = 13.95; Andrews et al., 2002), suggesting that the 75th percentile of ESS scores falls at approximately a score of 65, which was used as the minimal cut-off score for the current study.

Demographic characteristics of the sample are displayed in Table 1. Sixty-eight participants completed the baseline assessment (nintervention = 29; ncontrol = 39), 50 participants completed the post-intervention assessment (nintervention = 20; ncontrol = 30), and 32 participants completed the follow-up assessment (nintervention = 15; ncontrol = 17).

Table 1 Demographic and baseline clinical characteristics of sample (n = 68)

Procedure

Participants completed baseline measures, and those who were eligible were randomly assigned to the self-compassionate letter-writing condition or a waitlist control group. A CONSORT study flow diagram is presented in Fig. 1. Randomization was stratified by ESS scores to increase the likelihood that the distribution of shame scores would be equivalent across groups. It should be noted that participants were randomized following a 1:1 ratio from the initiation of the study through February 2020. Due to slower than anticipated recruitment in part due to the COVID-19 pandemic, we randomized participants in a 2:1 fashion (intervention vs. control group) between March 2020 and January 2021, which was the end of the data collection period. Participants in both groups completed post-assessment measures 16 days following completion of baseline measures as well as a one-month follow-up assessment. Participants assigned to the control group were given the opportunity to engage in the self-compassionate letter-writing practice upon completion of the study. Intervention participants were first directed to an online video that offered brief psychoeducation about self-compassion and the current study. Participants were asked to reserve 30 min per day for this exercise and be in a private space where they could read their letters out loud to themselves. They were then instructed to listen to audio recordings which guided them through imaginal and written exercises to foster compassion for others, including writing a compassionate letter to an imagined other who was experiencing pain or suffering. Starting in the second session, participants were prompted to begin self-compassionate letter writing (they could also continue writing compassionate other-focused letters on an optional basis). They were asked to complete daily self-compassionate letters for the remainder of the study. As compensation, all participants were offered course credit and the opportunity to be entered into a gift card raffle for up to US$150 based on level of participation. The current intervention was inspired by the research of Kelly and Leybman (2012) and Kelly and Waring (2018) and draws on Gilbert’s CFT (2010) as well as Neff and Germer’s MSC (2013). For further details about the self-compassionate letter-writing intervention, see Table 2.

Fig. 1
figure 1

CONSORT Flow Diagram

Table 2 Description of self-compassionate letter-writing intervention

Measures

Demographic Characteristics

At baseline, all participants completed a brief measure assessing demographic variables. Participants were also asked to report if they were in therapy or receiving psychiatric treatment at the time of the study. Participants were also asked at the end of their participation whether any change in concurrent treatment had occurred during the study period.

Experience of Shame Scale (ESS )

The ESS is a 25-item measure of global shame that probes characterological, behavioral, and bodily shame. Items are rated on a 4-point Likert-type scale based on the past year. Scores on the ESS range from 25–100, with higher scores indicating higher levels of shame. The ESS has shown strong internal consistency (α = 0.92), good 11-week test–retest reliability (r = 0.88), and strong convergent validity with other measures of shame (Andrews et al., 2002). The ESS demonstrated strong internal consistency and scale reliability in our sample at baseline (α = 0.83, ω = 0.79), post-assessment (α = 0.94, ω = 0.94), and follow-up (α = 0.94, ω = 0.94).

The Other as Shamer Scale-2 (OAS-2 )

The OAS-2 is an 8-item version of the original OAS (Goss et al., 1994), which measures external shame. The OAS-2 asks individuals to rate how frequently they experience external shame on a 5-point Likert-type scale, such that higher scores indicate greater external shame. The OAS-2 demonstrated strong internal consistency (α = 0.82), as well as a large correlation with the original OAS (r = 0.91) and a moderate correlation with the ESS (r = 0.54; Matos et al., 2015). The OAS-2 demonstrated strong reliability in our sample at baseline (α = 0.85, ω = 0.85), post-assessment (α = 0.90, ω = 0.90), and follow-up (α = 0.95, ω = 0.95).

Forms of Self Criticizing/Attacking and Self-Reassuring Scale (FSCRS )

The FSCRS is comprised of three subscales, one of which is the 9-item self-criticism subscale, Inadequate Self (e.g., “I am easily disappointed with myself”), which was used in this study. Items on the Inadequate Self subscale are rated on a 5-point Likert-type scale with higher scores indicating higher self-criticism. The Inadequate Self subscale demonstrated strong internal consistency (α = 0.90) as well as good convergent validity with other measures of self-criticism (r-values = 0.63–0.77; Gilbert et al., 2004). The Inadequate Self subscale demonstrated strong internal consistency and reliability in our sample at baseline (α = 0.88, ω = 0.88), post-assessment (α = 0.92. ω = 0.92), and follow-up (α = 0.92, ω = 0.92).

Self-Compassion Scale-Short Form (SCS-SF )

The SCS-SF is a 12-item version of the original 26-item SCS (Neff, 2003b) measuring self-compassion. Items are rated on a 5-point scale, with higher scores indicating greater self-compassion. The SCS-SF has demonstrated good internal consistency (α = 0.87) and is strongly correlated with the original SCS (r = 0.97; Raes et al., 2011). The SCS-SF showed strong internal consistency and reliability in our sample at baseline (α = 0.84, ω = 0.83), post-assessment (α = 0.86, ω = 0.85), and follow-up (α = 0.88, ω = 0.87).

Patient Health Questionnaire-9 (PHQ-9)

The PHQ-9 is a 10-item screening tool for depression. Individuals are asked to rate the severity of depressive symptoms occurring over the past two weeks. Items are rated on a 4-point Likert-type scale, with higher scores indicating more severe depression. The PHQ-9 has demonstrated strong internal consistency (α = 0.89; Kroenke et al., 2001) and scale reliability, which was also true in our sample at baseline (α = 0.83, ω = 0.83), post-assessment (α = 0.92, ω = 0.92), and follow-up (α = 0.91, ω = 0.91).

Generalized Anxiety Disorder 7-item Scale (GAD-7 )

The GAD-7 assesses symptoms of generalized anxiety disorder. Individuals rate their symptoms of general anxiety over the past two weeks on a 4-point Likert-type scale, with higher scores indicating more severe anxiety. The GAD-7 demonstrated strong internal consistency (α = 0.92) and one-week test–retest reliability (intraclass correlation = 0.83; Spitzer et al., 2006). The GAD-7 demonstrated strong internal consistency and scale reliability in our sample at baseline (α = 0.82, ω = 0.83), post-assessment (α = 0.85, ω = 0.85), and follow-up (α = 0.92, ω = 0.92).

Data Analyses

Preliminary analyses

Descriptive statistics and bivariate correlations were conducted in IBM® SPSS® Statistics, Version 24. Two-tailed independent samples t-tests were conducted to assess differences between the intervention and control groups on continuous demographic variables (e.g., age) and baseline measures. χ2 tests were conducted to determine whether groups differed on categorical demographic variables.

To examine the influence of potential confounding variables on the effect of time, standardized residual change scores for the baseline-to-post-assessment and the post-assessment-to-follow-up periods were calculated. Standardized residual change scores were selected over raw change scores because they account for variability at baseline and thus are considered superior estimates of change (Tucker et al., 1966).

Correlations between standardized residual change scores and baseline demographic characteristics were examined using Pearson’s r and Spearman’s ρ. Additionally, standardized residual change scores for the baseline-to-post-assessment and post-assessment-to-follow-up periods were correlated with the number of letters written in the intervention group to probe for the presence of a possible dose–response effect. Gender was examined as a covariate in the multilevel models. However, gender and other covariates did not affect model outcomes, so they were removed to conserve power in the final analyses.

Primary analyses

Because observations were nested within participants over time, we conducted hierarchical linear modeling (HLM) using restricted maximum likelihood (REML) estimations in R (R Development Core Team, 2013) with the lme4 and lmerTest packages (Bates et al., 2015; Kuznetsova et al., 2017). HLM using REML is robust to missing data and unequal assessment timepoint intervals across persons (Raudenbush & Bryk, 2002) and thus provides unbiased estimates in the presence of incomplete data.

Prior to conducting HLM, the assumptions of normality, linearity, and homogeneity of the residuals were assessed. The assumptions of linearity and homogeneity were examined graphically. The normality assumption was examined via standard z-scores of skewness and kurtosis. The standard error covariance structure was compared to compound symmetry and first-order autoregressive covariance structures using estimates of the Akaike Information Criterion (AIC) statistic (Akaike, 1981). The intraclass correlation coefficient (ICC) was calculated by dividing the random effect variance by the total variance to determine the proportion of variance explained by between-person differences.

To assess the primary study aims, we examined the impact of the intervention on the six outcome variables. First, an unconditional intercept model was specified with no predictors in the model; Outcometj = \({\beta }_{00}+{\varepsilon }_{ti}\). Next, an unconditional growth model that included time (Timepoint) at the occasion level was estimated; Outcometj = \({\beta }_{00}+{\beta }_{01}\left(Timepoint\right)+{\varepsilon }_{ti}\). The final interaction model included time, group, and the group by time interaction as predictors of the level-1 intercept and slope parameters; Outcometj = \({\beta }_{00}+{\beta }_{01}\left(Timepoint\right)+{{\beta }_{02}\left(Group\right)+\beta }_{03}\left(Timepoint\right)\left(Group\right)+{{\mu }_{01}\left(Timepoint\right)+ \varepsilon }_{ti}.\) To better understand the differential effects of the intervention, all significant group by time interactions were probed using simple slope analyses.

Results

Preliminary Analyses

Two-tailed independent samples t-tests revealed no differences between the intervention and control groups on age or any outcome measure at baseline; χ2 tests revealed no group differences in demographic characteristics (Table 1). Means and standard deviations for all outcome variables across the three assessment time points, as well as between- and within-group effect sizes, are displayed in Table 3. Bivariate correlations are presented in Table 4.

Table 3 Raw means and standard deviations of study variables and effect sizes
Table 4 Bivariate correlations of study variables at baseline

All associations between demographic characteristics and standardized residual change scores for the baseline-to-post-assessment period were nonsignificant, p > 0.05. However, for the post-assessment-to-follow-up period, identifying as female was associated with greater decreases in external shame, ρ = 0.38, p = 0.04, and general anxiety symptoms, ρ = 0.59, p = 0.001. Additionally, receiving concurrent treatment at the time of the study (i.e., psychotherapy and/or psychotropic medication) was associated with greater decreases in global shame, ρ = 0.40, p = 0.03. Correlations between follow-up period standardized residual change scores and gender minority status, sexual orientation minority status, race, ethnicity, and change in treatment were nonsignificant, ρ = 0.01–0.37, p > 0.05.

For participants in the intervention condition, the number of letters written was not associated with baseline-to-post changes, r = 0.07–0.34, p > 0.14, or post-to-follow up changes, r = 0.01–0.20, p > 0.50, on any outcome measure. Change in concurrent treatment over the course of the study was not associated with change in any outcome measure during the baseline-to-post period, r = 0.01–0.17, p > 0.24. However, change in concurrent treatment was significantly associated with greater reductions in general anxiety symptoms during the follow-up period, ρ = 0.38, p = 0.04.

Finally, AIC statistics across standard, compound symmetry, and first-order auto-regressive covariance component structures were compared to determine the most parsimonious structure to specify for subsequent analyses. Results indicated that the standard covariance structure demonstrated the smallest AIC statistics relative to the other potential structures. Evaluation of residuals suggested that the assumptions of linearity and homoscedasticity were met across outcomes.

Primary Analyses

Global shame

Based on the unconditional intercept model, 32.88% of the variance in global shame was accounted for by between-person effects. There was an effect of time predicting a negative linear change in global shame, \(\beta\) 01 = -4.90, t(41.35) = -3.70, p < 0.001. The unconditional growth model (AIC = 1181.40) demonstrated superior model fit compared to the unconditional intercept model (AIC = 1204.70), χ2(3) = 29.29, p < 0.001. In the final model, the main effects of group, \(\beta\) 01 = 5.77 (4.03), t(105.82) = 1.43, p = 0.15, and time, \(\beta\) 01 = -2.23 (1.66), t(57.90) = -1.34, p = 0.18, were nonsignificant. However, there was a significant group by time interaction, \(\upgamma\) 01 = -6.03 (2.49), t(53.77) = -2.42, p = 0.02. Those in the intervention condition experienced a significant decrease in global shame during the baseline-to-post-assessment period, t(19) = 5.49, p < 0.001, whereas the control group did not, t(29) = 1.37, p = 0.18. Neither the intervention, t(13) = -0.25, p = 0.81, nor control group, t(15) = 0.33, p = 0.75, experienced significant changes in global shame during the follow-up period. The final model (AIC = 1178.90) accounted for 56.66% of the variance in global shame and evidenced better model fit compared to the unconditional growth model, χ2(2) = 6.58, p = 0.04.

External shame

Based on the unconditional intercept model, 44.95% of the variance in external shame was accounted for by between-person effects. There was no main effect of time, \(\beta\) 01 = -0.72, t(45.94) = 1.17, p = 0.25. The unconditional growth model (AIC = 986.18) was not significantly different from the unconditional intercept model (AIC = 986.39), χ2(3) = 6.21, p = 0.10. Results from the final model indicated that the main effects for group, \(\beta\) 01 = 1.90 (2.26), t(62.60) = 0.84, p = 0.41, and time, \(\beta\) 01 = 0.41 (0.82), t(50.47) = 0.51, p = 0.61, were nonsignificant. However, there was a significant group by time interaction, \(\upgamma\) 01 = -2.54 (1.22), t(45.58) = 2.08, p = 0.04. Probing the interaction further revealed that intervention participants experienced a significant reduction in external shame during the baseline-to-post-assessment period, t(19) = 2.46, p = 0.02, whereas the control group evidenced a marginally significant increase in scores during this period, t(29) = -1.84, p = 0.08. Neither the intervention, t(13) = 1.02, p = 0.33, nor control group, t(15) = 1.10, p = 0.29, experienced further significant changes in external shame during the follow-up period. The final model (AIC = 984.13) accounted for 55.87% of the variance in external shame and demonstrated superior model fit compared to the unconditional growth model, χ2(2) = 6.04, p < 0.05.

Self-criticism

Based on the unconditional intercept model, 39.93% of the variance in self-criticism was accounted for by between-person effects. There was no significant main effect of time, \(\beta\) 0 = -1.10, t(44.49) = -1.70, p = 0.10. The unconditional growth model (AIC = 988.33) had significantly better model fit compared to the unconditional intercept model (AIC = 990.74), χ2(3) = 8.41, p = 0.04. Estimates from the final model indicated that the main effects for group, \(\beta\) 01 = 3.07 (2.37), t(59.14) = 1.29, p = 0.20, and time, \(\beta\) 01 = 0.11 (0.85), t(49.04) = 0.13, p = 0.90, were nonsignificant. However, there was a significant group by time interaction, \(\upgamma\) 01 = -2.74 (1.28), t(44.17) = -2.14, p = 0.04. Probing within-group effects further, the intervention condition experienced a marginally significant reduction in self-criticism during the baseline-to-post-assessment period, t(19) = 1.80, p = 0.09. The control condition evidenced a nonsignificant increase in scores during this period, t(29) = -0.23, p = 0.82. Neither the intervention, t(13) = 1.29, p = 0.22, nor control group, t(15) = 0.88, p = 0.39, experienced significant changes in self-criticism during follow-up. The final model (AIC = 987.23) accounted for 54.31% of the variance in self-criticism and was not superior to the unconditional growth model, χ2(2) = 5.10, p = 0.08, although this result was only marginally nonsignificant.

Depressive symptoms

Based on the unconditional intercept model, 58.08% of the variance in depressive symptoms was accounted for by between-person effects. There was no main effect of time, \(\beta\) 01 = -0.57, t(68.56) = -1.18, p = 0.24. The unconditional growth model (AIC = 948.02) did not demonstrate better model fit compared to the unconditional intercept model (AIC = 946.40), χ2(3) = 4.38, p = 0.22. When evaluating the final interaction model, the main effects for group, \(\beta\) 01 = 0.53 (1.92), t(60.99) = 0.27, p = 0.79, and time, \(\beta\) 01 = 0.19 (0.64), t(45.01) = 0.30, p = 0.77, were nonsignificant. However, there was a marginally significant interaction between group and time, \(\upgamma\) 01 = -1.71 (0.96), t(40.44) = -1.78, p = 0.08, suggesting that those in the intervention condition showed greater decreases in depressive symptoms. The final model (AIC = 946.82) accounted for 61.73% of the variance in depressive symptoms and was not superior to the unconditional growth model, χ2(2) = 5.20, p = 0.07, although this result was only marginally nonsignificant.

General anxiety

Based on the unconditional intercept model, 47.95% of the variance in general anxiety symptoms was accounted for by between-person effects. There was a main effect of time predicting negatively linear change in general anxiety, \(\beta\) 01 = -1.22, t(38.00) = -2.82, p < 0.01. The unconditional growth model (AIC = 896.97) demonstrated significantly better model fit compared to the unconditional intercept model (AIC = 903.65), χ2(3) = 12.69, p = 0.01. Results from the final model indicated that both main effects of group, \(\beta\) 01 = 2.12 (1.61), t(56.71) = 1.31, p = 0.19, and time, \(\beta\) 01 = -0.41 (0.57), t(41.68) = -0.71, p = 0.48, were nonsignificant. There was a significant interaction between group and time on general anxiety symptoms, \(\upgamma\) 01 = -1.81 (0.85), t(37.27) = -2.13, p = 0.04. The intervention condition demonstrated a significant decrease in general anxiety symptoms during the baseline-to-post-assessment period, t(19) = 2.59, p = 0.02, whereas the control group did not experience change in general anxiety symptoms, t(29) = 0.11, p = 0.91. In the follow-up period, the intervention condition exhibited a marginally significant further reduction in general anxiety, t(13) = 2.06, p = 0.06. There was no significant change in general anxiety symptoms in the control condition during this period, t(15) = 1.05, p = 0.31. The final model (AIC = 896.33) accounted for 57.89% of the variance in general anxiety symptoms. However, it was not superior to the unconditional growth model, χ2(2) = 4.64, p = 0.10.

Self-compassion

Based on the unconditional intercept model, 44.51% of the variance in self-compassion was accounted for by between-person effects. There was no main effect of time on self-compassion, \(\beta\) 01 = 0.98, t(52.30) = 1.33, p = 0.19. The unconditional growth model (AIC = 1023.90) demonstrated significantly better model fit compared to the unconditional intercept model (AIC = 1029.60), χ2(3) = 11.71, p = 0.01. Estimates from the final interaction model yielded nonsignificant effects for group, \(\beta\) 01 = -1.57 (2.76), t(66.58) = -0.57, p = 0.57, and time, \(\beta\) 01 = -0.16 (0.96), t(55.72) = -0.17, p = 0.87. There was a marginally significant interaction between group and time, \(\upgamma\) 01 = 2.66 (1.46), t(51.33) = 1.82, p = 0.07, suggesting that those in the intervention condition showed greater increases in self-compassion. The final model (AIC = 1022.10) accounted for 70.56% of the variance in self-compassion, and was marginally superior to the unconditional growth model (AIC = 1023.90), χ2(2) = 5.75, p = 0.05. The parameter estimates for the final model for self-compassion and all other outcome measures are presented in Table 5.

Table 5 Final model parameter estimates

Discussion

Over the past decade, there has been overwhelming evidence of an increasing mental health crisis among undergraduate students (Lipson et al., 2018). During this same period, there has been an increase in research evaluating the benefits of self-compassion for various mental health concerns, including shame (e.g., Ferrari et al., 2019). Interventions focused on self-compassion comprise numerous techniques and exercises to help individuals de-shame and develop a warmer, more caring intrapersonal relationship (Matos & Steindl, 2020). One such technique is self-compassionate letter writing, which helps individuals develop a supportive self-dialogue, rather than a self-critical, shaming one. Over the past few years, several studies have demonstrated the psychological benefits of self-compassionate letter writing in both clinical and undergraduate populations (Dreisoerner et al., 2020; Johnson & O’Brien, 2013; Kelly & Waring, 2018; Stern & Engeln, 2018; Wong & Mak, 2016). The current study sought to contribute to the existing literature by examining the efficacy of a 2-week online self-compassionate letter-writing intervention for undergraduate students with high shame, a significant transdiagnostic vulnerability factor.

We hypothesized that compared to participants in the control group, those who engaged in self-compassionate letter writing would experience greater reductions in global shame, external shame, self-criticism, general anxiety, and depressive symptoms, as well as greater increases in self-compassion. With respect to shame, our hypothesis was supported; practicing self-compassionate letter writing was associated with significant decreases in both global and external shame, with medium to large between- and within-group effect sizes. Practicing self-compassionate letter writing was also associated with significantly greater reductions in self-criticism, with medium between- and within-group effect sizes. Notably, these gains were maintained during the 1-month follow-up period, suggesting that improvements in shame and self-criticism were sustained.

As hypothesized, general anxiety decreased significantly in the intervention condition with medium between- and within-group effect sizes at post-assessment, and there was a further marginally significant reduction during the follow-up period. However, undergoing a change in concurrent treatment during the study period was significantly associated with greater reductions in general anxiety during the follow-up period. Therefore, it is unclear whether changes in participants’ treatment outside of the study accounted for the long-term decreases in general anxiety, highlighting the need for replication of this finding.

Our hypothesis about change in depression was not well supported. The group by time interaction was only marginally significant, although in the anticipated direction. However, the fit of the final model was poor relative to the other outcomes already discussed, which undermines confidence in this finding and calls for further research to better understand the effects of self-compassionate letter writing on depression.

Our hypothesis that there would be a greater increase in self-compassion over time in the intervention group was also not fully supported. Again, we found only a marginally significant group by time interaction. However, between- and within-group effect sizes were medium to large, suggesting a possible effect of the intervention on participants’ self-compassion. It is unclear why a self-compassion-focused intervention did not produce larger changes in self-compassion or how this intervention produced significant effects on other outcomes when changes in self-compassion were not more strongly demonstrated. However, this finding is in line with at least one previous study that found self-compassionate letter writing to have a greater direct influence on reducing shame than increasing self-compassion (Johnson & O'Brien, 2013). It is possible that increasing self-compassion is a process that occurs more slowly and may require more time to observe effects than shame, self-criticism, and anxiety.

Limitations and Future Research

The current study has several limitations. One significant limitation was the issue of power. There is currently no gold-standard power analysis method for hierarchical linear modeling. Thus, our a priori power analysis was based on a power analysis for a repeated-measures between-groups analysis of variance design and two prior studies of self-compassionate writing that examined samples of 90 individuals (30 participants in each of the three conditions; Johnson & O’Brien, 2013) and 40 individuals (20 participants in each of the two conditions; Kelly & Waring, 2018). However, a post hoc power analysis (Rosner, 2011) suggested that we were underpowered, possibly somewhere in the range of 40–50% power. It was estimated that a sample size of approximately 100 (n = 50 per condition) would be needed to achieve fully adequate power. However, post hoc power analyses must be interpreted with caution (Levine & Ensom, 2001). Furthermore, despite these questions around power, substantial effects were observed.

It also warrants mention that much of this study was conducted during the COVID-19 global pandemic, which affected enrollment. Enrollment rates during the pandemic months were roughly 50% of those during equivalent non-pandemic months. However, the slow recruitment rate may also have been attributable to perceived burdensomeness of the study by students. Our numbers reflected a higher refusal rate in the intervention condition as compared to the control condition. The perceived burden of the intervention may be one explanation for this finding. It is also possible that some prospective participants felt fearful about engaging in a self-compassion-focused intervention altogether (Gilbert et al., 2011). Those who participated in the intervention condition completed, on average, eight self-compassionate letters out of a maximum of 14 (participants who completed at least one letter were included in analyses). It is possible an amount of time less than 30 min may have been sufficient for the self-compassionate letter-writing exercise and requesting participants to spend 10–15 min on the exercise may have been perceived as more feasible and approachable.

Our findings additionally warrant some discussion about gender. In the current study, identifying as female was associated with greater decreases in external shame and general anxiety during the follow-up period. This raises the question of whether those who identify as women may be more likely to experience long-term benefits of compassionate mind training. Because this sample was predominantly women (approximately 81%), further research is needed to examine whether there are true differential outcomes of self-compassionate letter writing regarding gender. Future research should examine similar interventions in samples with greater diversity in gender identity.

Future research is needed on how interventions like self-compassionate letter writing may prevent or reduce mental health burden and enhance accessibility of mental health resources. This is a highly cost-effective and accessible intervention that targets shame, an important transdiagnostic vulnerability factor. It was conducted fully online and guided only by pre-recorded audio without the need for a live clinician. There has been increased research on technology-based mental health interventions, especially for university students (Lattie et al., 2019). Interventions of this nature may help those who could benefit from self-compassion, but who do not have immediate access to mental health services due to long waitlists and high demand on university counseling center resources, or for reasons including stigma and shame, which have been identified as powerful barriers to treatment seeking (Goetter et al., 2020). Other barriers, including limited financial resources, lack of health insurance, or geographic location may also be addressed with accessible online interventions like this one. Additionally, it may be beneficial to adapt a similar intervention for child and adolescent populations. It would be interesting to see whether early preventive interventions of this nature have the potential to enhance resiliency and decrease the prevalence of psychological disorders that often onset during early adult years. Further research may additionally examine whether interventions like self-compassionate letter writing help decrease the severity of psychological symptoms and suicide risk while individuals remain on clinic waitlists. For example, university counseling centers may consider examining whether self-compassionate letter writing has utility to decrease distress among students waiting to be seen by a professional and whether it has any potential to decrease the number of students who need to be seen by a live therapist.