Embracing the Complexity of our Inner Worlds: Understanding the Dynamics of Self-Compassion and Self-Criticism

Although research in self-compassion has been rapidly growing, there is still substantial controversy about its meaning and measurement. The controversy centers on Neff’s popular Self- Compassion Scale (SCS) and the argument that compassionate self-responding (CSR) and uncompassionate self-responding (UCS) are a single dimension versus the argument that they are two semi-independent, unipolar dimensions, with UCS not reflective of “true” self-compassion. We review the evidence for both positions and conclude that the data cannot yet resolve the debate. Neither position is proven to be right or wrong. We recommend the way forward is to let go of traditional factor analytic approaches and examine self-compassionate behavior as a dynamic network of interacting processes that are influenced by context. This leads us to three classes of testable hypotheses. The link between CS and UCS will depend on the timeframe of measurement, current circumstances, and individual differences. We propose a middle ground to the SCS debate; rather than supporting the single total score, 2-factor score (CSR and UCS) or the 6-factor score (the six subscales of the SCS), we argue these constructs interact dynamically, and the decision of which scoring method to use should depend on the three testable contextual hypotheses.

Self-compassion research is exploding, with hundreds of studies and many meta-analyses supporting the benefits of self-compassion interventions across several areas (Ferrari et al., 2019;Inwood & Ferrari, 2018;Kirby et al., 2017;Turk & Waller, 2020;Wilson et al., 2019). Perhaps the most used measure in the area, Neff's Self-Compassion Scale (SCS; Neff, 2003) has been cited over 6470 times (based on Google Scholar, April 11, 2022). Having a valid and clearly understood scale is important for practical and theoretical reasons. Practically, a scale allows one to not only evaluate the mechanisms of change in self-compassion interventions, but also inform and guide practitioners about how best to intervene. Theoretically, the scale posits the structure and content of self-compassionate behavior, and guides the way we think about self-compassion and conduct research into it. A valid scale is essential.
There has been substantial debate about the validity of Neff's SCS, a debate that cuts to the meaning of self-compassion. This debate has been heated, with publication titles including "The Forest and the Trees…", "Stripping the forest from the rotten trees… (Muris et al., 2019b)" and "Setting the record straight about the Self-Compassion Scale (Neff, 2019)." One set of researchers have argued that the SCS comprises semi-independent, unipolar continuums, which we might label as compassionate self-responding (CS) and uncompassionate self-responding (UCS; Brenner et al., 2017;Muris et al., 2016Muris et al., , 2019a. The evidence in favor of this position is two-fold: First, factor analysis shows that SCS items form a positive cluster and negative cluster (Brenner et al., 2017;Muris et al., 2016). Second, UCS scales have been shown to more strongly predict psychopathology than CS scales (Muris, 2016). The implication of these results is that people can be high in both compassionate and uncompassionate responding, low in both, or high in one but not the other. People have "multiple inner voices" or ways of relating to themselves, sometimes positive, sometimes negative. This is consistent with research showing a weak link between adaptive and maladaptive behavior (Ciarrochi et al., 2022a). These findings have led Muris to argue that negative items should be removed from the scale as they do not represent true self-compassion and inflate the correlation between self-compassion and psychopathology (Muris, 2016).
Challenging this view, Neff argues that self-compassion forms a bipolar continuum, ranging from CS to UCS (Neff, 2022). Neff does not disagree with any of the findings, but argues that the findings do not invalidate the bipolar continuum hypothesis. Specifically, she argues that Muris and others have fallen into the differential effects fallacy, or the idea that two ends of a continuum cannot differentially predict outcomes. Neff uses the example of temperature to illustrate the point. Cold may predict frostbite better than heat, but they are still part of the same continuum, and heat can prevent frostbite. Neff's core argument is that CS and UCS dynamically relate, with increases in compassionate responding inhibiting uncompassionate responding. Consistent with this view, research has found that a self-compassionate mood induction both increased CS and decreased UCS (Arimitsu, 2014;Neff et al., 2021). Neff also supports the bipolar idea by showing that all factors in the self-compassion scale load on a single global factor, as well as six subfactors (Neff et al., 2021).
The most interesting part about this debate is that both sides agree on the quantitative findings. The debate is based on what people think the findings imply. All models assume six specific factors influence responses to SCS items: selfkindness, common humanity, mindfulness, self-judgment, isolation, and over-identification. The models differ in the extent they see the items as also caused by a single global factor (self-compassion; Neff et al., 2021) or two global factors (CS and UCS;Brenner et al., 2017). However, the important thing is that both sides agree on the specific factors. It follows from the assumptions of these psychometric models that CS and UCS subscales have a unique, latent "cause," besides the global cause. To put this in concrete terms, based on these models, we should be able to come up with experimental manipulations that uniquely influence self-kindness more than self-judgment, and vice versa. This is an experimental version of the differential effect approach described above. To use Neff's temperature metaphor, if UCS and CS have semi-independent causes, you should be able to make someone feel generally cold and hot at the same time.
If we shift our focus from pure psychometrics validity to utility, we see Neff's view more clearly. We agree with Neff that demonstrating CS and UCS predict different criteria does not prove that treating CS and UCS as separate factors is practically useful. We know that CS and UCS negatively correlate, and that self-compassion interventions change both CS and UCS (Arimitsu, 2014;Neff et al., 2021), so at the present time there is no experimental demonstration of differential effects. Perhaps all the action is in the global factor, as Neff suggests, and the differences between subfactors are trivial.
Where does this leave us? We do not have the data to resolve this issue. Neff's data suggests self-compassion interventions move all six factors, but this does not mean that future intervention work will not find something different. A novel psychological intervention may have the ability to alter common humanity without having a strong effect on self-judgment. Using energy as a metaphor may clarify this point. Neff's temperature metaphor describes self-criticism as coldness and self-compassion as warmth; but in energy terms self-compassion may also be considered light. We will find a correlation-things that are cold are also dark (e.g., the far side of the moon). Things that are bright are also hot (e.g., our sun). These expressions of energy support Neff's argument-sometimes it is appropriate to treat constructs as opposites along a dimension. However, we also find that some things can be bright but also cool to the touch (an LED light) and or some things can be not very bright but be very hot (a hot plate stove top, the flame of a candle). Therefore, sometimes, we should treat CS and UCS as different constructs because the distinction is useful to advance our understanding of human minds.
In astronomy, differentiating luminosity from the temperature of stars is important in the classification of stars. The Hertzsprung-Russell diagram ( Fig. 1) shows that most stars lie in the main sequence with brighter stars also being very hot and dimmer stars being relatively colder. But it also shows that some stars can be very bright but relatively cold (red giants) while some stars can be very hot but relatively dim (white dwarfs). Similarly, only discussing the SCS total score, not the subscores, may limit our ability to understand the idiosyncratic complexity of our inner states. The issue becomes even more precarious as newer statistical approaches take hold that challenge the utility of traditional psychometrics based on groups of people (Hayes et al., 2019(Hayes et al., , 2020aWright & Woods, 2020). Let us inspect those assumptions now and how they might bias our thinking.

The Traditional Psychometric Approach
Let us start by assuming that the same latent construct, self-compassion, causes CS and UCS items (Fig. 1). This type of model appears to fit state-self-compassion data extremely well, especially if items are allowed to crossload (Neff et al., 2021). Neff et al. (2021) report a CFI fit index of 0.996. Given a CFI of 0.95 is considered excellent and 1 is perfect fit (Hu & Bentler, 1999), it would seem like the discussion about factor structure is closed. The model fits too well to challenge.
However, we should be careful to not fall in love with these fit indices in psychometric models. Two models can be radically different in their assumptions, yet be statistically equivalent, in that, regardless of the data, the two models would yield the same correlation, covariation, and other matrices, and also yield identical goodness of fit indices (Bentler & Satorra, 2010). This means there are models that are statistically equivalent to Fig. 2, including ones that assume the subfactors cause each other, rather than independently cause responses. Thus, a model that fits extremely well is not necessarily the most accurate or useful model. This leads us to ask, what other models might help us understand self-compassionate responding? To keep things simple, we will focus on what might influence self-kindness and self-judgment. These two scales typify CS and UCS, the core focus of the debate between Neff and Muris. However, our arguments could apply to any aspect of the selfcompassion model (e.g., the link between mindfulness and common humanity).
Our core argument is that the strength of the link between self-kindness and self-judgment is not only driven by specific and global latent variables, as suggested in Fig. 2. It is also driven by three aspects of context, namely, time frame, current circumstances, and individual differences. Accounting for these three aspects of context could help to inform how we understand these constructs and how we operationalize them through scoring the SCS.

Hypothesis 1: Time Frame Moderates the Structure of Self-kindness and Self-judgment
The time frame that self-kindness and self-criticism are assessed within is likely to influence the complexity of inner dialog, or the size of intercorrelations. Specifically, we hypothesize that as the time frame decreases, the correlation will increase between positive and negative aspects of self-compassion, as has been observed between positive and negative affective states (Dejonckheere et al., 2021). Indeed, Fig. 1 The Hertzsprung-Russell diagram which shows temperatures of stars plotted against their luminosities. Credit: European Southern Observatory, https:// www. eso. org/ public/ images/ eso07 28c/ licensed under CC by 4.0 when people are asked about their self-compassionate states "right now," correlations between self-compassion subscales are relatively large in absolute magnitude, ranging from 0.39 to 0.87 (see Table 6 in Neff et al. (2021)). In contrast, the data we describe below focus on general tendencies, and the magnitude of the correlations ranged from 0.06 to 0.66 (Table 1). For example, the link between self-kindness and self-judgment is 0.73 in the Neff state study, but only 0.30 in the trait study we report below. Thus, self-compassion appears to be less unidimensional the more people reflect on their life and the less focused they are on a specific moment. The greater the time span, the more chances people have to behave inconsistently. Further research is needed, using the same sample of people, to examine if narrowing the time frame produces increased bipolarity.
Hypothesis 2: Current Circumstances Moderate the Structure of Self-compassion Dejonckheere et al. (2021) hypothesized that our affective systems shift from relative independence to stronger bipolarity when we experience personally relevant concerns. They conducted a daily diary study and found that the link between positive affect and negative affect became increasingly negative as the participants anticipated a test result, indicating higher bipolarity, and then become more independent as the evaluative event passed. Will the same be true for self-compassion? For example, immediately after a failure or setback, do people engage in self-talk that is either kind or cruel, but not both? If they experience a compassionate event, say a friend saying something supportive, does this make the self-compassionate construct more bipolar? If self-kindness inhibits self-judgment (Neff, 2003), then this is what we would expect. (Table 7 in Neff et al. (2021)) that suggests that this may indeed be happening. The correlations between self-kindness and UCS responding as self-judgment, isolation, and over-identification are smaller pre-intervention (r = 0.53, 0.51, 0.45, respectively) than post-intervention (r = 0.62, 0.55, 0.58), hinting at the possibility that self-compassionate interventions increase the bipolarity of self-compassion. A future diary study is needed, similar to Dejonckheere et al., to formally evaluate this possibility.

Neff et al. present a table of correlations
Once we recognize that context may influence the strength of association between self-kindness and self-judgment, we open the door for many interesting hypotheses. For example, a context that reinforces inter-team competition, comparison, and criticism (Ntoumanis & Vazou, 2005) may motivate people to minimize common humanity ("I am better than you"), and also lead to both self-kindness ("I have to look after myself to win") and self-judgment ("I have to be hard on myself"). In contrast, a therapeutic context that encourages self-exploration may cause the three forms of compassionate responding to co-occur. For example, therapy might teach me to be kind to myself and mindful, so that I learn more about myself and also that I am normal and do not need to fix myself (common humanity).

Hypothesis 3: Individual Differences Moderate the Structure of Self-compassion
Our third hypothesis is more radical than the first two and needs further justification, both in terms of theory and data. If this hypothesis is correct, it means there is an exciting new world of research possibilities opening up that might lead to a very different way to examine all measures. If so, we can set aside traditional, group-based psychometric approaches in favor of a more "idionomic" or individual level approaches, and make discoveries that allow us to personalize interventions and possibly increase effect sizes (Ciarrochi et al., 2022b;Hayes et al. 2020a, b;Sanford et al., 2022).
Traditional models are based on group averages and the ergodic assumption (Birkhoff, 1931;Hayes et al., 2019Hayes et al., , 2020a that the behavior of the collective models is the behavior of individuals (Birkoff, 1931;Hayes et al., 2019Hayes et al., , 2020a. To translate this assumption to self-compassion measurement, one ergodic assumption would be that the negative relationship between self-kindness and self-judgment observed at the group level between people applies to each individual across time and situations. It is this very assumption that vitalizes the practical and personal implications of existing research on self-compassion, but if this assumption is violated, then statistical techniques based only on group averages cannot be used to model individual structure and change (Molenaar, 2013;Molenaar & Campbell, 2009).
To illustrate this point, we look at archival data from the Australian Character Study (Ciarrochi et al., 2017(Ciarrochi et al., , 2020. We provide some evidence-based examples that are consistent with Hypothesis 3 that individuals can differ in self-compassion structure. The analysis here is intended to encourage further evaluation of Hypothesis 3 rather than acting as a definitive test of it. The best data to test Hypotheses 3 will be longitudinal time series data, like those collected by Fisher et al. (2019) and Sanford et al. (2022).

Empirical Examples Consistent with Hypothesis 3
These findings are based on 1939 (970 females, 969 males) students in Grade 10 (mean age = 15.65, SD = 0.43) across 16 schools, who completed self-compassion assessments. Self-compassion was measured using the 12-item short form of the Self-Compassion Scale (Raes et al., 2011). Participants indicated their agreement with statements on a 5-point scale (1 = "almost never" to 5 = "almost always"). Higher mean scores indicate higher levels of self-compassion. The scale has six, two-item subscales, namely self-judgment ("I'm disapproving and judgmental about my own flaws and inadequacies"), over-identification ("When I'm feeling down I tend to obsess and fixate on everything that's wrong"), isolation ("When I'm feeling down, I tend to feel like most other people are probably happier than I am"), self-kindness ("I try to be understanding and patient towards those aspects of my personality I don't like."), common humanity ("When I feel inadequate in some way, I try to remind myself that feelings of inadequacy are shared by most people."), and mindfulness ("When something upsets me I try to keep my emotions in balance"). Table 1 illustrates the correlations between the subscales, a pattern typically found in the field (Brenner et al., 2017;Raes et al., 2011). The CS subscales correlate more strongly with each other than with UCS, and vice versa, suggesting that positive and negative forms of self-compassion load on different factors. The correlations between positive and negative scales range from medium (− 0.30; self-kindness and self-judgment) to small (− 0.06; isolation and common humanity). These correlations are based on group averages, but the correlations are sufficiently small that they suggest substantial individual differences. They mean, for example, that a high score on a CS factor will not always be associated with a low score on UCS factor.
To illustrate this point, we have broken self-kindness and self-judgment into tertiles and present a cross-tabulation of the results in Table 2. This table illustrates how the group average can be misleading. If people are in the highest tertile of self-kindness, they are often in the lowest tertile of self-judgment (48.2%), and vice versa (49.5%). This is consistent with the negative correlation between the two variables. However, Table 2 also illustrates that a substantial number of people are high in both self-kindness and selfjudgment (24.7%) and low in both (26.8%). Thus, the data from approximately ¼ of the participants do not fit the traditional factor model.
The table is based on cross-sectional data. Do we see similar effects longitudinally? If young people increase in self-kindness from 1 year to the next, do they also decrease in self-judgment, as we would expect from self-compassion theory and psychometric modeling? (Gilbert, 2009;Neff et al., 2021) We addressed this question with archival data from the Australian Character Study, which assessed selfcompassion in the same youth from Grades 9 to 12. A total of 952 participants (464 male; 488 female) completed at least three of the four waves of the study and are reported here (see Ciarrochi et al. (2019) for details of sample). Multilevel analysis was used to examine how within-person changes in self-kindness correlated with within-person changes in self-judgment. We compared a random intercept model that assumed the same association between self-kindness and self-judgment across people to a random intercept and random slope model that assumed differing associations across persons. We found the statistical difference between the two models to be highly significant (χ 2 (2) = 203.7, p < 0.00001), suggesting that the strength of association between self-kindness and self-judgment differs from person to person. In the random slope model, the fixed effect association between self-judgment and self-kindness was B = − 0.26 (SE = 0.02, t = − 12.1, p < 0.001), indicating that increases in self-kindness were generally associated with decreases in self-judgment. However, there was substantial variation in this effect, as seen in Fig. 3. The lines with decreasing slopes represent youth that experienced lower self-judgment during years when they experienced higher self-compassion. In contrast, for a substantial number of youth, there was either no link between the two variables, or in some instances, increasing self-kindness was associated with increasing self-judgment (the positive sloping lines). At the group level, fixed effects did not apply to these individuals. The nature of their individual lives disappeared into "error" in the fixed effects estimation, which is a concrete demonstration of why failures of ergodicity such as this are both practically and conceptually troublesome. Since the ergodic assumption is built into the classical psychometric model (Molenaar, 2004), further adjustments to psychometric approaches appear to be necessary going forward. Empirically, this might entail high density longitudinal measurement of self-compassion, hypothesized outcomes, and related processes followed by idiographic dynamic network modeling, considering subgroup clusters that result only if they improve idiographic fit-what has been termed an idionomic approach (Hayes et al., submitted).

A Process-Based Approach to Self-compassion
Neff's and others' view that CS inhibits UCS (Gilbert, 2009;Neff, 2003) points to an alternative way to thinking about self-compassion. Neff says "While the six elements of selfcompassion are separable, they are thought to mutually impact one another and interact in a system" (p. 122; Neff et al., 2021). We agree. Rather than seeing the components of self-compassion as being caused by a unidimensional construct, we can see them as a system of processes that interact and influence each other (Ciarrochi et al., 2021;Hayes et al., 2019Hayes et al., , 2020a. We do not have to assume that the same processes drive self-kindness for everyone, nor that these processes are independent of context. There are contexts where self-compassion behaves as a unidimensional construct but there are contexts where we would lose a rich understanding of the dynamic processes involved in selfcompassion if we combined constructs which are separate and can mutually inhibit each other.
Our proposal has implications for the scoring method of the SCS. Neff has led several psychometric studies which steadfastly defend the use of a SCS total score (Neff, 2016(Neff, , 2020Neff et al., 2017Neff et al., , 2019. In comparison, several different research teams from across the globe have found

Fig. 3
The within-person association between yearly changes in selfcompassion and yearly changes in self-judgment stronger support for the 2-factor (UCS and CS subscales) or the 6-factor (self-kindness, self-judgment, common humanity, isolation, mindfulness, over-identification subscales) scoring methods for the SCS over the use of a single, unidimensional score (López et al., 2015;Muris & Petrocchi, 2017;Muris et al., 2016Muris et al., , 2021Zhang et al., 2019). Based on the contextual hypotheses of this commentary, we propose a compromise. We strongly support the reporting of the 2-factor or 6-factor solution of the SCS. When authors have a theoretical or empirical rationale for using the total score, a subscale analysis should also be included for complete and transparent reporting of data. If this level of detail exceeds journal word counts, additional subscale analyses can be reported in supplementary materials. This is a recommendation some authors of this commentary have not followed in the past , but intend to in the future. Indeed, a commitment to the scientist-practitioner model requires constant hypothesis testing and revaluation of our understanding in light of additional evidence (Muris & Otgaar, 2020).

Practical Implications
How might the ideas discussed in this paper influence the practitioner? Let us say the practitioner begins with a process-based case conceptualization of two clients, as illustrated in Fig. 4 (see Hofmann et al. (2021) for details of PBT case conceptualization). Practitioners can draw this out perhaps after the first session, as they develop an Fig. 4 Two process-based, case conceptualizations of self-compassion understanding of the client's self-compassion. This conceptualization would be updated and revised as the practitioner works with the client. Black arrows show processes that are positively linked, and clear arrows show a negative link. For person A, self-judgment reduces the extent that they endorse common humanity, or are mindful and kind to themselves. If this case conceptualization is accurate, then targeting self-judgment might be an effective way to improve all the other processes downstream. Self-kindness is "downstream" and would not be targeted first, since, at least in this model, it would not reduce self-judgment. In contrast, for person B, self-kindness and self-judgment are negatively related. Increasing self-kindness for this person would diminish selfjudgment and indirectly reduce over-identification and isolation. Self-judgment, over-identification, and isolation form a self-amplifying loop, and disrupting one of these processes might disrupt all three. The above case conceptualization could be based on clinical judgment and experience. Future research is needed to examine the utility of such a process. However, these conceptualizations can also be derived empirically. One can use dedicated apps, wearables, and other methods to collect intensive within-person data and examine within-person structures over time (Fisher et al., 2017;Sanford et al., 2022.). For example, Sanford et al.'s (2022) recent study could provide a model for future self-compassion work. These researchers assessed psychological processes utilizing an experiencing sampling design and 60 measurement occasions per person, allowing them to examine within-person differences in processes and outcomes. They found no psychological process (out of 18) that universally affected an individual in the same way. For example, for some participants, feeling stuck and unable to change was a key driver of sadness, whereas for others, a lack of mindfulness was the key driver. This kind of "idionomic" empirical analysis would allow the practitioner to tailor an intervention to the needs of the clients, with the first client being supported to "get unstuck" and make behavioral changes, and the second client being shown how to use mindfulness practices.

Conclusions
The battle between factor models of self-compassion is unlikely to be resolved. As long as we focus on debating the "true structure" of the SCS and the "true latent factors" underpinning it, we are likely to stay at a stalemate. There will be many factor models that are statistically equivalent or close to equivalent, so statistics won't decide the issue. Nor will these models help us decide whether self-judgment is an essential part of the structure of self-compassion (Neff, 2022) or not a part of self-compassion and should be removed from the scale (Muris, 2016). The "true" nature of self-compassion is likely to be an assumptive and definitional issue, rather than something that can be sorted out statistically.
However, if we see traditional factor analysis as just another useful tool and not the ultimate arbiter of truth, we are free to let go of that tool and try something else. We have suggested a shift away from studying the structure of self-compassion as a latent construct to understanding selfcompassionate behavior as a system of interacting processes that may be influenced by aspects of time frame, current situation, and the individual. We believe Neff (2022) and Gilbert (2009) are proposing this kind of model when they suggest that stimulating feelings of safety, warmth, and connectedness will counteract non-compassionate responding and over-arousal of the threat system. Their theoretical model suggests a reciprocal influence between self-kindness and self-judgment, but their statistical model does not make this association explicit (e.g., Fig. 1).
Instead of allowing individual human beings to disappear into a statistical fog of between-person variability, it seems kinder and more compassionate to give each individual and their own lived experience a voice. That can be readily done by looking at life as it is lived, assessed via high density longitudinal measurement, and then examining the role of processes such as self-compassion and its putative elements within each individual person against the background of variability in their own life moments. From there it is possible to explore the varieties of nomothetic patterns, provided they increase the precision of idiographic modeling. Arguing purely based on classical factor analysis has so far created more heat than light, but this idionomic alternative will allow researchers to collect data that fits the ultimate purpose of this empirical struggle: being able to better understand and empower people.

Funding Open Access funding enabled and organized by CAUL and its Member Institutions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.