“I don't believe psychological treatments will help me – but don't tell my therapist!”. Negative prognostic beliefs about the effectiveness of psychotherapy (negative outcome expectations; OE) are problematic as they can impair the success of psychological treatments (Constantino et al., 2018; Dew & Bickman, 2005; Greenberg et al., 2006). Critically, OE are usually assessed with direct measures only (e.g., via questionnaires), which are subject to distortions by social desirability and various further motives (Greenwald et al., 1998). Therefore, direct measures might not fully grasp the actual underlying expectations, and we may need additional, more unobtrusive measures that are not as easily controllable (Strack & Deutsch, 2004). This study aimed to develop and validate a single category implicit association test (SC-IAT) to measure the OE of psychotherapy indirectly.

Expectations in Psychological Treatments

OE can be defined as future-directed beliefs of a person if psychotherapy will effectively reduce their symptoms (Kirsch, 1985; Laferton et al., 2017). They can be differentiated from generalized expectations, which encompass several contexts and are more similar to associations between two constructs (e.g., psychotherapy associated with effectiveness; Laferton et al., 2017). Also, therapy motivation differs from OE since it includes a readiness to engage in psychotherapy but does not necessarily encompass effectiveness beliefs (Constantino, 2012; Norcross et al., 2011). Some researchers define the credibility of treatment as a distinct construct from OE (Devilly & Borkovec, 2000). However, credibility could also be a prerequisite for expectations (e.g., if patients do not find a treatment logic, they cannot expect it to be effective for them; Panitz et al., 2021).

Expectations have been assigned an increasingly central role in medical and psychological studies and treatments (Constantino et al., 2018; Rief & Glombiewski, 2016; Rief et al., 2022). Different researchers investigated if OE differ across sociodemographic characteristics and clinical conditions. Previous results demonstrated relationships between less positive OE with older (vs. younger) age, in men (vs. women), with lower (vs. higher) education, and with higher anxiety and depression (Cohen et al., 2015; Constantino et al., 2018; McHugh et al., 2013; Silverman et al., 2021; ten Have et al., 2010; Vîslă et al., 2019). However, these findings were very inconsistent across studies. Overall, patients with treatment experience expressed more positive OE than those who never experienced psychotherapy (MacNair-Semands, 2002; Silverman et al., 2021). Thereby, satisfaction with the treatment was related to positive OE, whereas negative experiences were related to negative OE (Tran & Bhar, 2014).

Critically, negative OE were associated with adverse effects in clinical and psychotherapeutic practice, for instance, increased pain (Bingel et al., 2011; Corsi & Colloca, 2017) or impaired effectiveness of psychotherapy (Constantino et al., 2018; Dew & Bickman, 2005; Greenberg et al., 2006). Consequently, we need reliable and valid measurements of OE to identify negative expectations so that we can change them and prevent their negative adverse effects.

The Measurement of Expectations

Past research on OE is typically based on direct measures (Laferton et al., 2017). Direct measures openly ask the participants verbally or via questionnaires to what extent they assume psychotherapy to be effective. However, a different approach to assessing a construct of interest could be via indirect measures.Footnote 1 In indirect measures, the construct of interest (here: OE) is indirectly inferred from participants’ task performance in a given task.

Since mental health treatment is stigmatized, there are potential biases in the self-report measures of psychotherapy OE (Corrigan, 2004; ten Have et al., 2010). For instance, if patients feel pressured to express positive expectations to satisfy their therapist but hold negative expectations, the directly measured expectations could be invalid (Grimm, 2010). This effect is not only present in psychological treatments but further well known in psychological studies. In psychological studies, the participants might respond in favor of the hypothesis to be a "good" participant (demand effect; Orne, 1962), which could distort the measured OE. Furthermore, some patients might have problems expressing their expectations directly. For instance, when patients do not want to, do not know, or are unable to express their true expectations because of a lack of introspective abilities (Nosek et al., 2011). In these cases, an indirect measurement would be helpful because it might be more unobtrusive and not as easily controllable. Consequently, the measured OE might be less influenced by self-presentational distortions and the participant´s knowledge of their beliefs and attitudes (Greenwald et al., 1998).

Another reason we might need indirect measures is that they could contribute to predicting health-relevant behavior. In particular, a comprehensive meta-analysis demonstrated that direct and indirect measures predicted behavior (for instance, an obese + low performance | normal weight + high performance IAT predicted hiring decisions). Importantly, indirect measures predicted behavior more stable, independent of the study characteristics, target groups, and type of behaviors (Kurdi et al., 2019; Rüsch et al., 2009). Furthermore, when direct and indirect measures match, psychotherapy success might be most likely (Rief et al., 2022). For instance, exposure success might be most robust when expectation change can be detected via both direct and indirect measures. In summary, these results suggest that indirect measures of OE could add value beyond the direct measure by contributing to predicting health-relevant variables.

Indirect Assessment of Expectations Toward Psychotherapy

Only three studies to date have investigated computer-based indirect measures to capture psychotherapy OE (Goguen et al., 2016; Pfeiffer et al., 2022; Silverman et al., 2021). In particular, they used the Implicit Association Test (IAT), one of the most widely used indirect measurement methods (Greenwald et al., 1998). In an IAT, participants are asked to assign words presented in the middle of the screen (e.g., “psychological treatment”, “useful”) as quickly as possible to categories shown left and right on the screen by pressing a key. In one block, “psychotherapy + effective” were assigned to the same key (and “medication + unhelpful” to another key). In the other block, “psychotherapy + unhelpful” were assigned to the same key (and “medication + effective” to another key). According to the IAT framework, faster responses in the “psychotherapy + effective” (and “medication + unhelpful”) block as compared to the “psychotherapy + unhelpful” (and “medication + effective”) block speak for positive psychotherapy OE (stronger associations between psychotherapy + effective | medication + unhelpful compared to psychotherapy + unhelpful | medication + effective). The OE IAT demonstrated moderate internal consistency (r = 0.58; Silverman et al., 2021) and reasonable construct validity indicated by the significant correlations between the indirect and direct measures (r = 0.07–0.32; Goguen et al., 2016; Pfeiffer et al., 2022; Silverman et al., 2021). Concerning the overall means, however, Silverman et al. (2021) found positive psychotherapy OE, Goguen et al. (2016) found positive medication OE, while Pfeiffer et al. (2022) found no effect in the overall sample.

Critically, using this IAT paradigm, we can only interpret the results directly related to the used reference category. Positive psychotherapy OE means that participants perceived psychotherapy as more effective than medication. Positive medication OE means that participants perceived medication as more effective than psychotherapy. However, we can only interpret the relative preference of medication over psychotherapy (or vice versa). In the case of positive psychotherapy OE, both psychotherapy and medication could be assumed to be effective, with psychotherapy slightly more effective than medication. Alternatively, both could be assumed to be unhelpful, with medication slightly more unhelpful than psychotherapy. Conflictingly, no effect in the IAT could mean that the psychotherapy and medication are considered both very unhelpful or both very effective. Consequently, we cannot draw conclusions about the underlying psychotherapy expectations from this IAT. That is, in studies where medication might not be relevant or with patients not considering medication, we need a measure of psychotherapy OE independent of medication OE to investigate its influence on psychotherapy outcomes.

In order to rectify this problem and to make a statement about psychotherapy OE only, the single category IAT (SC-IAT) could offer a good solution. The SC-IAT uses only one target category and two attribute categories. It allows the measurement of associations between the target and attribute categories without directly referring to another category. The internal consistency in SC-IATs is usually smaller than in IATs (average of α = 0.80; Greenwald et al., 2021) and self-reports but higher than in other indirect measurements such as evaluative priming (Karpinski & Steinman, 2006). In validation studies, the SC-IAT demonstrated good internal consistency (adjusted r = 0.55–0.85) and validity (indirect-direct correlations r = 0.02–0.38) in measuring soda preferences, stereotypes, attitudes toward homosexuality, and anxiety (Breen & Karpinski, 2013; Karpinski & Steinman, 2006; Stieger et al., 2010). This is why we developed and tested a Therapy SC-IAT in the present study.

Critically, the IAT and SC-IAT have been found to be confounded by factors other than the construct of interest (e.g., cognitive skills, speed accuracy; see Klauer et al., 2010 for further explanations of the method-specific variance). This is why it has been proposed to control for such method-related confounds by, for instance, adding a control (SC-) IAT unrelated to the construct of interest in the experimental setup (see Teige-Mocigemba & Klauer, 2015). If such a control (SC-) IAT shows the same effects as the newly developed target (SC-) IAT, this would mean that confounds of the measurement outcome of an (SC-) IAT rather than the to-be-assessed constructs itself drive the observed effects. To control such unwanted influences, we thus included an OE-unrelated Flower SC-IAT in our study.

Research Question

The present research aimed to develop and validate a SC-IAT for indirectly measuring OE toward psychotherapy. To this end, we assessed self-report (direct) measures of OE, a Therapy SC-IAT, and a control Flower SC-IAT (used in Klauer et al., 2010) in a large heterogeneous sample. For the validation of the Therapy SC-IAT, we predicted that (i) psychotherapy is more strongly associated with effective than unhelpful (see Seewald & Rief, 2023; Silverman et al., 2021), while Flowers is more strongly associated with positive than negative, (ii) the Therapy SC-IAT is positively correlated with the direct measures of OE (convergent validity), and that (iii) the Flower SC-IAT is positively correlated with the Therapy SC-IAT (method-specific variance), while it is not significantly correlated with the direct measures of OE (discriminant validity).

Since negative OE can impair outcomes, people at risk for negative OE should be identified. Therefore, we conducted a regression analysis to investigate if directly and indirectly measured OE vary across demographic characteristics and psychological disorders. Because of the inconsistency of previous study results, we investigated age, gender, nationality, education, previous psychotherapy experiences, current problems, anxiety, and depression exploratory without predefined hypotheses. Last, we analyzed the incremental validity by examining if adding the indirect measure to the direct measures as predictors for experiences with psychotherapy improves the model.



Our recruitment goal was based on the exploratory regression analysis since this required more individuals for adequate power. To detect a small effect size of f2 = 0.02 with 0.80 power at the Bonferroni corrected 0.0083 error probability, 610 participants would be needed (calculated using G*Power; Faul et al., 2007, 2009). We recruited online via university mailing lists and social media from 11th March to 12th May 2022. For our experiment, 1017 participants volunteered to participate using a computer, laptop, or tablet with a keyboard, from which 278 did not finish the study, 31 did not meet our priori-determined inclusion criteria, and three had to be excluded because they participated twice. A total sample of 705 participants remained (M = 31.13 years, SD = 12.96 years, range = 18–82 years), including 198 men, 502 women, five non-binary, and 53.9 % of all participants reported having a current mental health problem (Table 4). Inclusion criteria were: At least 18 years old, German as a native language or at a native language level, no visual impairment that affects reading on the computer or tablet, no severe neurological disorder, and no disorder with psychoses (e.g., schizophrenia or schizoaffective disorder).

Study Design

This study was a within-subjects design with the order of the two SC-IATs (Flower SC-IAT vs. Therapy SC-IAT) counterbalanced across participants. The study was preregistered at the open science framework OSF ( and approved by the University’s local ethics committee (reference number: 2022-02k).


The online study was implemented in SoSci Survey (Leiner, 2019). First, participants gave informed consent, and exclusion criteria were checked. Then, both SC-IATs were presented in a randomized order (within participants). After completing both SC-IATs, the direct measures of OE followed [Credibility Expectancy Questionnaire (CEQ), Devilly & Borkovec, 2000; Attitudes Toward Seeking Professional Psychological Help-Short Form (ATSPPH), Fischer & Farina, 1995; Semantic differentials]. A written statement encouraged the participants to rate the directly measured OE considering a current mental health problem or an imagined problem if they did not have a current problem. Last, participants completed the final questionnaires [Demographic Data; Therapy Motivation; Patient Health Questionnaire (PHQ-9), Kroenke & Spitzer, 2002; generic rating scale for previous treatment experiences, treatment expectations, and treatment effects (GEEE), Rief et al., 2021; Generalized Anxiety Disorder Screener (GAD-7), Spitzer et al., 2006] and received a debriefing. The experiment lasted approximately 20 min, and participants could take part in a raffle for vouchers (10 × 30 euros). In addition, we donated 1 euro per participant to an organization for mental health.

Indirect Measures

Therapy SC-IAT

Both SC-IATs were built in SoSci Survey (Leiner, 2019) and followed the setup of Karpinski and Steinman (2006), illustrated in Fig. 1. Participants (N = 25) who did not take part in the present study pre-rated 25 words for each category of the Therapy SC-IAT (more details in the supplementary material). From these word ratings, we chose five words for the attribute categories effective and unhelpful and the target category psychotherapy. All items were presented in black Arial font on a white background. Target and attribute category labels were displayed at the bottom of the screen. We counterbalanced the psychotherapy-effective block (psychotherapy + effective | unhelpful) and psychotherapy-unhelpful block (psychotherapy + unhelpful | effective) across participants. Each block contained 24 practice trials and 72 test trials. Words were presented in the middle of the screen in a 7:7:10 ratio to minimize different frequencies of key presses. Therefore, 58 % of correct responses were mapped on the key that was associated with a target and an attribute category, while only 42 % of correct responses were mapped on the key that was associated with one attribute category only. Participants had to categorize the presented words into the target and attribute categories as quickly as possible. In each trial, the word remained on the screen until there was a response or for 1.500 ms. When participants did not respond, “Please respond quicker!” appeared for 500 ms on the screen. A green O replaced the stimulus for 150 ms when participants gave a correct response, and a red X replaced it for 150 ms when they gave an incorrect response.

Fig. 1
figure 1

Example of psychotherapy-effective and psychotherapy-unhelpful blocks with two trials each

(Control) Flower SC-IAT

The Flower SC-IAT included a subset of stimuli used by Klauer et al. (2010). Specifically, the attribute categories positive and negative were used, and the target category flowers with five words each. As for the Therapy SC-IAT, we counterbalanced the flower-positive block (flower + positive | negative) and the flower-negative block (flower + negative | positive). The Flower SC-IAT had the same structure as the Therapy SC-IAT.

Data Preprocessing and D-Score Calculation

We followed the SC-IAT outlier exclusion and data preprocessing criteria from Karpinski and Steinman (2006) for both SC-IATs. More details of the calculations are displayed in the supplementary material, and a video explanation of the task and the R script for the D-score calculation are uploaded at OSF ( We had to exclude 67 participants in the Therapy SC-IAT and 65 participants in the Flower SC-IAT because of error rates greater than 20 %. For the D-score calculation, we subtracted the average response time of the psychotherapy-effective block (or flower-positive block) from the average response time of the psychotherapy-unhelpful block (or flower-negative block) and divided it by the standard deviation of all correct response times. The D-score of zero indicates that the association strength between psychotherapy and effective (or flower and positive) is similar to the association strength between psychotherapy and unhelpful (or flower and negative). A positive D-score in the Therapy SC-IAT indicates more effective than unhelpful associations with psychotherapy (positive psychotherapy OE). A positive D-score in the Flower SC-IAT indicates more positive than negative associations with flowers.

Direct Measures

Direct Measures of OE

Credibility Expectancy Questionnaire (CEQ; Devilly & Borkovec, 2000)

As a direct measure of OE, we used a translated version of the CEQ (Koch et al., 2016). We made some adaptions to the original scale because we did not assume that all of our participants have a current problem. In the instruction, we added: “We ask you to indicate below how much you believe psychotherapy would be helpful for mental health problems. Important: If you currently have a problem, please refer to that problem. If you do not currently have a problem, imagine that you do and then answer the following questions”. Also, we excluded the first item (as preregistered, "At this point, how logical does a therapy offered to you seem?"). Participants answered three items on a 9-point scale from 1 (e.g., not helpful at all) to 9 (e.g., very helpful) and two items on an 11-point scale from 0 to 100 %. We transformed these items into a 9-point scale for analysis to calculate a total mean score. Three items assessed how participants think, and two items assessed how participants feel about psychotherapy's effectiveness in helping with a personal or hypothetical mental health problem. We calculated a total mean score (range = 1–9) because Devilly and Borkovec (2000) demonstrated high internal consistency (α = 0.84–0.85) and reasonable construct validity for this mean score.

Attitudes Toward Seeking Professional Psychological Help-Short Form (ATSPPH; Fischer & Farina, 1995)

Participants completed a translated version of the ATSPPH (Coppens et al., 2013), which measures psychological help-seeking attitudes. Participants answered ten items (e.g., “If I believed I was having a mental breakdown, my first inclination would be to get professional attention”) on a 4-point Likert-type scale ranging from 0 (disagree) to 3 (agree). We reported a total sum score (range = 0–30). Fischer and Farina (1995) demonstrated for this sum score high one-month test–retest reliability (rtt = 0.80), high internal consistency (α = 0.84), and a moderate correlation (r = 0.39) to previous experiences with professional help.


Participants responded to the questions “To what extent do you think of psychotherapy as effective or unhelpful?” (Silverman et al., 2021) and “To what extent do you think of flowers as positive or negative?” on a 7-point Likert-type scale ranging from -3 (extremely unhelpful or negative) to 3 (extremely effective or positive). These questions were used previously in IAT studies to assess participants’ self-reported judgments of how strongly the IAT’s labels are related (Greenwald et al., 1998).

Exploratory Variables

Patient Health Questionnaire (PHQ-9; Kroenke & Spitzer, 2002)

As a measure of depressive symptoms in the last 14 days, we used a translated version of the PHQ-9 depression module (Gräfe et al., 2004). The PHQ-9 contains nine questions with a 4-point Likert-type scale (0 not at all to 3 nearly every day). We used a sum score (range = 0–27), for which Gräfe et al. (2004) showed high internal consistency (α = 0.88) and satisfactory discriminant and criterion validity with a clinical screening cutoff of ten.

Generalized Anxiety Disorder Screener (GAD-7; Spitzer et al., 2006)

As a measure of anxiety symptoms, we used a translated version of the GAD-7 (Löwe et al., 2008). The GAD-7 contains seven questions with a 4-point Likert-type scale (0 never to 3 nearly every day). We used a sum score (range = 0–21), for which previous studies showed high test–retest reliability (rtt = 0.83), high internal consistency (α = 0.89–0.92), and good construct validity with a clinical screening cutoff of ten (Löwe et al., 2008; Spitzer et al., 2006).

Generic Rating Scale for Previous Treatment Experiences, Treatment Expectations, and Treatment Effects (GEEE; Rief et al., 2021)

As a measure of previous psychotherapy experiences, we assessed the fourth item of the GEEE. Participants had to indicate if they experienced psychotherapy treatment never, daily, more than 10 days, 5 to 10 days, 1 to 4 days, or not during the last 12 months but before. The scale was initially designed for medication treatment, so we transformed the scale into a binary variable that is better suited for studying psychotherapy experiences (0 = have not been in psychotherapy or no experience [never], 1 = have been in psychotherapy previously or some experience [all other categories]).

Therapy Motivation

As a rating of motivation to do psychotherapy, we used one item (“How motivated would you be to do psychotherapy to work on your problems?”). Participants answered this item using a visual analog scale (not at all motivated or 0 % to fully motivated or 100 %).

Statistical Analysis

We conducted all analyses using R version 4.1.0 (R Core Team, 2022).

Validation of the Therapy SC-IAT

Internal Consistency and Mean Scores

We checked the internal consistency for our direct measures (CEQ and ATSPPH) by calculating Cronbach’s alpha. For the internal consistency calculation of the SC-IATs, we used the same approach as Karpinski and Steinman (2006) to be able to compare the results. We removed the 24 practice trials, calculated D-scores for each third of the test trials (24 trials each), and then calculated Spearman-Brown corrected correlations (adjusted r = 3*r/[1 + (3 − 1)*r]) between the thirds. We conducted one-sample t-tests (two-tailed) to examine whether the means of OE and flower associations significantly differed from zero (indirect measures and self-reports) or the mean of the scale (CEQ: 5, ATSPPH: 15).

Convergent and Discriminant Validity

For the convergent and discriminant validity, we calculated two-tailed Pearson correlation tests. We used p-values < 0.05 as criteria for statistically significant results for these analyses. We added motivation scores as an exploratory factor in the correlation analyses (not preregistered).

Regression Analysis

For our exploratory regression analysis, we tested single predictors out of all predictors with partial F-tests, with the indirectly and directly measured OE as dependent variables. We used Bonferroni corrected p-values < 0.0083 (preregistered as corrected p-value for six comparisons within the linear regression) as criteria for statistically significant results.

Incremental Validity (Exploratory Analysis of Experiences with Psychotherapy)

For the exploratory analyses of the incremental validity (not preregistered), we used likelihood-ratio tests, in which we compared a model with an effect of interest to a model without the effect of interest (nested models). We examined if adding the indirect measure to the direct measures as predictors for experiences with psychotherapy leads to a significantly better model.

Bayes Factors

To be able to state evidence for the null hypothesis and have a lower risk for false-positive results (Wagenmakers et al., 2011; Wetzels et al., 2011), we additionally calculated Bayes factors (BF; BayesFactor package, Version 0.9.12–4.2; Morey & Rouder, 2018). We incorporated default priors (ttestBF: rscale = √{2}/2; correlationBF: rscale = 1/3; lmBF: rscaleFixed = 1/2) and increased sample number to 100,000. Bayes factor encompasses the data's probability (marginal likelihood) given one hypothesis relative to another hypothesis (Jeffreys, 1961; Kass & Raftery, 1995). We reported BF01, which indicated evidence in favor of the null hypothesis, and BF10, which indicated evidence in favor of the alternative hypothesis (BF10 = 1/BF01). Bayes factors were interpreted after Jeffreys (1961): Values between 1 and 3 (or 1.00–0.33) as anecdotal evidence, values between 3 and 10 (or 0.33–0.10) as moderate evidence, values between 10 and 30 (or 0.10–0.03) as strong evidence, values between 30 and 100 (or 0.03–0.01) as very strong evidence, and values > 100 (or < 0.01) as extreme evidence.


Validation of the Therapy SC-IAT

Internal Consistency and Mean Scores

The Therapy SC-IAT (adjusted r = 0.67) and the Flower SC-IAT (adjusted r = 0.62) demonstrated reasonable internal consistency in this study, with reliability estimates that were comparable to other SC-IAT studies (Hyde et al., 2010: adjusted r = 0.73; Karpinski & Steinman, 2006: adjusted r = 0.55–0.85; Rebar et al., 2015: adjusted r = 0.73–0.84). The average error rates in the test trials (after exclusion) were 8.2 % (SD = 4.5 %) for the Therapy SC-IAT and 8.4 % (SD = 4.4 %) for the Flower SC-IAT. Table S2 in the supplemental material displays the mean reaction times. All CEQ and ATSPPH items demonstrated good internal consistency (CEQ: α = 0.89; ATSPPH: α = 0.74). As predicted, positive OE resulted in the indirect and direct measures with Bayes factors demonstrating extreme evidence for these effects (Table 1). Flower D-scores and Flower self-reports were significantly more associated with positive than negative, with Bayes factors demonstrating extreme evidence.

Table 1 Mean scores (SD) and test statistics of the one-sample t-tests of indirect and direct measures

Convergent Validity—Correlations of Indirect and Direct Measures

Concerning the predicted convergent validity, the Therapy SC-IAT correlated positively with all direct measures of OE except the CEQ (Table 2). Bayes factors showed anecdotal evidence for these correlations (Therapy self-report: BF01 = 1.42; ATSPPH: BF10 = 1.10; CEQ: BF01 = 1.81; motivation: BF10 = 2.76).

Table 2 Correlations of study variables

Discriminant Validity—Correlations Between OE and the Flower Measures

Concerning the predicted discriminant validity, the Therapy SC-IAT did not significantly correlate with the Flower self-reports, with anecdotal evidence for the null effect (BF01 = 1.95). In addition, the Flower SC-IAT did not significantly correlate with the direct measures of OE, with moderate to strong evidence for the null effects (Therapy self-reports: BF01 = 7.77, ATSPPH: BF01 = 10.62, CEQ: BF01 = 10.12, motivation: BF01 = 9.54). As predicted, the Flower SC-IAT positively correlated with the Therapy SC-IAT (method-specific variance), with very strong evidence for the effect (BF10 = 77.96).

Regression Analysis

Therapy D-scores were significantly predicted by age, and older age endorsed more positive Therapy D-scores (Table 3). Flower D-scores were significantly predicted by age and gender, and older age and women (vs. men) endorsed more positive Flower D-scores.

Table 3 Regression analysis of therapy D-scores

Therapy self-report and CEQ scores were significantly predicted by age, and older age endorsed less positive directly measured OE (Table 4; Table S3 in the supplementary material displays the mean scores). Therapy self-reports, CEQ, and ATSPPH scores were significantly predicted by gender and experience with psychotherapy. Women (vs. men) and experience with psychotherapy (vs. no experience) endorsed more positive directly measured OE.

Table 4 Regression analysis of direct measures of outcome expectations

Incremental Validity (Exploratory Analysis of Experiences with Psychotherapy)

In exploratory analyses, we tested a model including only the direct OE measures (self-reports, CEQ, ATSPPH) against a model, additionally including the indirect measure (Therapy D-scores) as a statistical predictor for experiences with psychotherapy. The second model, including the Therapy D-scores, significantly better described the data [χ2(1) = 6.02, p = 0.014], indicating that the indirect measure of OE was positively associated with people who have (vs. have not) been in psychotherapy even when we controlled for variance explained by the direct measures.

As a control analysis, we tested a model including only the direct measures of OE (self-reports, CEQ, ATSPPH) against a model, additionally including the Flower D-scores as a statistical predictor for experiences with psychotherapy. The second model, including the Flower D-scores, did not significantly better describe the data [χ2(1) = 0.31, p = 0.58], indicating that there was no association between Flower D-scores and whether people have or have not been in psychotherapy, controlling for the direct measure of OE. Figure 2 displays mean indirect and direct scores dependent on experience with psychotherapy.

Fig. 2
figure 2

Mean indirect and direct scores dependent on experience with psychotherapy. Self-Report = Self-Report Scores (range = -3–3); ATSPPH = Attitudes Toward Seeking Professional Psychological Help (range = 0–30); CEQ = Credibility Expectancy Questionnaire (range = 1–9). Error bars indicate ± 1 SE


To the best of our knowledge, this is the first study that developed and validated an online SC-IAT to indirectly measure the OE of psychological treatments. We conducted the Therapy SC-IAT, a (control) Flower SC-IAT, and three direct measures of OE in a large sample. The Therapy SC-IAT correlated with the direct measures of OE (except the CEQ; convergent validity) and did not correlate with measures of flower associations (discriminant validity). Furthermore, the indirect OE were positively associated with people who have (vs. have not) been in psychotherapy, even when we controlled for the direct measures of OE, indicating evidence for incremental validity of the Therapy SC-IAT.

In line with our hypotheses, psychotherapy was more strongly associated with effective than unhelpful, which ties well with the previous study of Silverman et al. (2021), indicating more psychotherapy + effective associations compared to medication + effective associations. Our findings extend the previous evidence because we demonstrated positive indirectly measured OE independent of another reference category (e.g., medication in the study by Silverman et al., 2021). Furthermore, the Therapy SC-IAT was positively associated with the Flower SC-IAT due to the method-specific variance. Also, the Therapy SC-IAT was not related to the direct measures of flower association, indicating the discriminant validity of the Therapy SC-IAT.

Moreover, the expected associations between the Therapy SC-IAT and the direct measures of OE (convergent validity) were significant (except with CEQ) but relatively low, with the Bayes factor indicating only anecdotal evidence. Such relatively low correlations between indirect and direct measures have been found in other areas, for instance, a previous Race SC-IAT (Karpinski & Steinman, 2006), homosexuality SC-IAT (Breen & Karpinski, 2013), or anxiety IATs (Egloff & Schmukle, 2002, 2004; Gschwendner et al., 2008). Notably, the correlations between the Flower SC-IAT and direct OE measurements were insignificant. Because we consider the correlations of the Therapy SC-IAT and the direct measures of OE as unexpectedly low, we would like to discuss four possible reasons for this finding and ways to improve the indirect-direct correlations.

First, we shed light on the D-score calculation Karpinski and Steinman (2006) recommended. Using high error rates (> 20 %) as exclusion criteria resulted in high exclusion rates (11.3–11.6 % of all participants), which were comparable to other studies (5.4–13.6 %; Karpinski & Steinman, 2006). The current literature on IATs recommends including even participants with high error rates for the D-score calculation (Greenwald et al., 2003, 2021). However, a previous SC-IAT study filtered participants who were instructed to fake their responses by excluding high error rates (Karpinski & Steinman, 2006). Therefore, the error exclusion criteria might help filter participants pretending to have an alternative attitude, for instance, participants who want to disclose positive OE despite holding negative OE. In the supplementary material, we provide all analyses with the alternative D-score calculation recommended by Greenwald et al. (2021). With this scoring, the associations between the Therapy D-scores and the CEQ turned significant, and the associations between Therapy D-scores and all direct measures of OE slightly increased (by r = 0.00–0.04 resulting in r = 0.08–0.13 with the Greenwald scoring).

Second, we observed that the convergent and discriminant validity of the Therapy SC-IAT increased when participants completed the Therapy SC-IAT before the Flower SC-IAT (Tables S9 and S10 in the supplemental material display the correlations). To tackle these order effects, we randomized the order of both SC-IATs. Also, using an additional Flower SC-IAT in this study was relevant for validation. However, in further studies investigating OE, the Therapy SC-IAT can be used without other indirect measures, likely increasing indirect-direct correlations. Also, to obtain higher convergent and discriminant validity, we recommend implementing the Therapy SC-IAT at the beginning of studies investigating OE.

Third, the convergent and discriminant validity of the Therapy SC-IAT is higher in participants with no current psychological disorder compared to participants with a current psychological disorder by self-report (Tables S11 and S12 in the supplemental material display the correlations). The SC-IAT uses reaction times that can alter due to cognitive load or impairments, which are not uncommon in many psychological disorders. Consequently, it might be difficult for patients to respond to the demonstrated words within the response window of 1.500 ms. In this study, a high proportion reported having a current psychological problem (53.3 % in the indirect sample), while our Therapy SC-IAT still demonstrated reasonable reliability and validity. In addition, struggling with a current disorder did not influence the number of missing responses (> 1.500 ms), fast responses (< 350 ms), or error rates (all ps > .05). At this stage of understanding, we believe that the Therapy SC-IAT might be applicable to most psychological patients. However, including people with various psychological disorders increases the variance in the result patterns, further underpinned by the found differences across disorders (see supplemental material) and requests for disorder-specific analyses in future studies.

Last, we discuss the construct validity. The SC-IAT was developed to measure associations, meaning that the Therapy SC-IAT measured associations between psychotherapy and effective. It is unclear whether this is equivalent to a specific OE. For OE, one must not only associate psychotherapy with effectiveness but also expect it to be effective for their personal problem. Using the Therapy SC-IAT, a global attitude could have been measured instead (Karpinski & Steinman, 2006), which could overlap with help-seeking and motivation, reflected in the found associations with these constructs. Future studies could use the labels “I find effective” and “I find unhelpful” instead of effective and unhelpful for a more personalized SC-IAT (see Olson & Fazio, 2004), which might result in higher correlations between direct and indirect measures.

Nevertheless, we would like to point out that we did not aim for a 1:1 overlap of the indirect and direct measures. Instead, we wanted to develop an indirect measurement, which adds value to the measurement of OE and the prediction of therapy-relevant variables. In summary, although the indirect-direct correlations turned out lower than expected, this should not necessarily be interpreted as evidence against the Therapy SC-IATs validity (Stieger et al., 2010). The developed Therapy SC-IAT might provide a useful complementary measure, and we recommend further investigating its validity considering the discussion points above.

Demographic Differences and the Influence of Experiences with Psychotherapy

Our exploratory regression analyses in this study revealed differences in indirectly and directly measured OE dependent on age, gender, and experience with psychotherapy. In all direct measures, more positive OE were associated with younger age, which aligns well with some studies (McHugh et al., 2013) but contradicts other studies (Vîslă et al., 2019). Younger participants could hold less stigma about seeking psychological treatments and expect them to be more effective (Silverman et al., 2021). In both indirect measures, we found the opposite since older age was associated with more positive therapy and flower associations. However, this association could be driven by age-related slowing in the incompatible block (psychotherapy-unhelpful and flower-negative block) compared to the compatible block (psychotherapy-effective and flower-positive block) caused by declined cognitive abilities with older age (Hummert et al., 2002; Sherman et al., 2008).

In addition, more positive OE were associated with women compared to men, aligning with many other studies (Cohen et al., 2015; McHugh et al., 2013; Seewald & Rief, 2023; Silverman et al., 2021; Vîslă et al., 2019). This association could have social reasons again, as going to therapy meets more women than men stereotypes (e.g., talking about emotions; Silverman et al., 2021).

Our exploratory evidence highlighted that more positive indirect and direct OE were associated with people who have been (vs. have not been) in previous psychotherapy, replicating previous findings (Goguen et al., 2016; Silverman et al., 2021). Flower D-scores did not differ between participants with or without previous psychotherapy experiences, which indicates that the differences in the Therapy D-scores are not driven by method-specific variance shared between the Flower SC-IAT and Therapy SC-IAT. Overall, the association between more positive OE with previous psychotherapy experience can be interpreted in two directions since we only have cross-sectional data. We can speculate that positive OE might have led to seeking psychotherapy in the past. Notably, the Therapy SC-IAT might add value to predict this behavior. Alternatively, experience with psychotherapy could have led to more positive expectations (Ladwig et al., 2014; MacNair-Semands, 2002; Silverman et al., 2021; ten Have et al., 2010). Overall, our exploratory findings have the potential to inspire new theories in a bottom-up, data-driven way. Experimental or longitudinal studies should further disentangle the relationship between OE and experiences with psychotherapy under consideration of different psychological disorders.

Limitations and Future Directions

In the following, we discuss two possible limitations of the outlined study. First, even though we pretested our words for the SC-IAT in an independent study sample and achieved high typicality and indifferent frequency of the chosen words for our target and attribute categories, it was impossible to rule out word length differences. Psychotherapy words were longer than effective and unhelpful words, which might have influenced SC-IAT scores (Greenwald et al., 2021). However, since we did not have to use another reference category, possible word-length effects would be equally distributed across the psychotherapy-effective and psychotherapy-unhelpful blocks in the Therapy SC-IAT, making unwanted biases unlikely.

Last, we cannot generalize the results due to our study sample. Almost 1/3 of our sample had therapy experience, and 1/2 had a current psychological problem. We donated one euro to a mental health organization for participating, which could have attracted more participants who already had experiences with psychological disorders or psychotherapy. Moreover, we did not ask about the participant’s ethnic background, but almost everyone had German citizenship and a high education. Based on previous studies, ethnic background could influence OE (Silverman et al., 2021; Zhou et al., 2019). Furthermore, mental health systems vary tremendously across countries. Since expectations can develop from experiences (Ladwig et al., 2014; ten Have et al., 2010), we assume that OE are dependent on the mental health system of the specific country. Therefore, our results should be expanded with a heterogeneous sample, including participants of different ethnic backgrounds and educational levels, investigating direct and indirect measures of OE under different healthcare systems.

This study is the first that developed an indirect measure of OE. In the future, researchers should try to answer the following questions: (1) In contexts where social desirability plays a central role or if participants are unable or unwilling to tell their therapist their OE, can indirect measures identify negative OE better than direct measures? (2) Can indirect measures predict help-seeking, health behavior, and outcomes better than direct measures? For these aims, our developed Therapy SC-IAT should be implemented in experimental and longitudinal designs.


In conclusion, this study provides the first stepstone to indirectly measuring patients' expectations in psychotherapy. We suggest that this developed SC-IAT could provide a valuable indirect add-on to the direct measures since positive indirect OE were associated with previous psychotherapy experience, even when we controlled for variance explained by the direct measures of OE, indicating incremental validity of the Therapy SC-IAT. However, future studies should further investigate the indirect measures´ reliability and validity in a clinical context considering the discussed influences on the indirect-direct correlations. With this study, we aim to reach a more comprehensive measurement of OE in psychotherapy.