Introduction

Pathological anxiety results from excessive fear of specific situations, memories, and thoughts and causes impairing distress and avoidance of these feared situations (American Psychiatric Association, 2013; Yang et al., 2021). Fear-based symptom clusters in anxiety are characterized by excessive threat responses and include somatic hyperarousal symptoms prominent in panic disorder, social phobia, specific phobia, obsessive compulsive disorder, and posttraumatic stress disorder (Craske et al., 2009; Levin-Aspenson et al., 2021). These clinical presentations differ from distress disorders, such as generalized anxiety disorder, that feature worry and rumination symptoms and show more overlap with depression (Watson, 2005).

Anxiety has long been conceptualized as a disorder of maladaptive learning (Mowrer, 1939; Rachman, 1977; Zinbarg et al., 2022), in part due to the relative effectiveness of learning-based treatments, such as exposure therapy. Yet, exposure therapy and other psycho- and pharmacotherapies are only moderately effective in treating anxiety, and many people do not respond to or are not able to access treatment (Bandelow et al., 2017; Eddy et al., 2004; Watts et al., 2013). Recent advances in computational learning theory and neuroscience of learning (Craske et al., 2014; Levy & Schiller, 2021; Pine, 2017; Pulcu & Browning, 2019) have the potential to greatly improve our understanding and ultimately treatment and prevention of anxiety disorders. To benefit from these advances, we need to fully understand how treatments, such as exposure therapy, change learning, how learning is altered in pathological anxiety, and how contemporary computational learning theory can inform treatment. In particular, we need to understand how uncertainty-related behavior and neural function are disrupted in anxiety and how these disruptions can be remediated with treatment.

Current, cognitive-behavioral conceptualizations view anxiety as due to excessive learned threat expectancy, maintained by avoidance of feared outcomes and remediated by deliberately encountering anxiety-producing situations (Craske et al., 2008; Foa & Kozak, 1986). Yet, laboratory studies of learning in anxiety have failed to consistently find enhanced aversive conditioning or excessive valuation of aversive stimuli (Duits et al., 2015), and many techniques used in exposure therapy are inconsistent with excessive learned threat as the root of anxiety. Instead, uncertainty-related disruptions often are seen in anxiety (Carleton, 2016; Grupe & Nitschke, 2013; Piray & Daw, 2021; Pulcu & Browning, 2019). However, existing theoretical frameworks do not explain how dysfunctional learning and decision-making under uncertainty create avoidance and are remediated with exposure therapy. We focus on exposure therapy, because it is hypothesized to causally manipulate learning in anxiety; however, a number of other effective treatments for anxiety exist, including antidepressants. While we discuss human studies, they need to be considered in the context of the large animal literature on threat conditioning, avoidance, and their pharmacological manipulations (for reviews see Borsini et al., 2002; Harro, 2018; LeDoux et al., 2017; Maren, 2001, 2011; Quirk & Mueller, 2008).

Here, we consider the clinical observations of exposure therapy alongside neurocomputational theories of uncertainty and advance a view of maladaptive learning in anxiety based on uncertainty-sensitive learning and exploration. We propose 1) that maladaptive uncertainty learning is central to fear-based anxiety disorders, 2) that this disrupted uncertainty learning leads to dysfunctional explore/exploit decisions in aversive environments to cause impairing avoidance, and 3) that successful treatment, particularly exposure therapy, remediates these dysfunctions. Reconceptualizing treatment in this way will improve our understanding of how therapies for anxiety disorders work, rendering them more effective.

Current state of exposure therapy

Learning principles and development of exposure therapy

Mary Cover Jones first applied learning theory to reduce anxiety in the 1920s (Jones, 1924a, 1924b). Building on the studies of conditioning of John Watson and others, she showed that some learning techniques successfully reduced fear in children with various phobias. Exposure therapy techniques began to be systematically developed decades later, including systematic desensitization (Wolpe, 1961) and flooding/implosive therapy (Ayer, 1972; Cooper & Clum, 1989; Keane et al., 1989), informed by the laboratory work of Mowrer and others (Mowrer, 1951). The theoretical principles behind exposure therapy were then elaborated as emotional processing theory by Edna Foa and colleagues (Foa & Kozak, 1986). Emotional processing theory proposed that exposure therapy updated the maladaptive emotional content of memory structures through repeated experience with situations activating these structures. Across all of these approaches, the core steps of exposure therapy are clear: first, feared situations are identified, and then patients are guided through structured exposures. In these exposures, they engage in feared situations until a (often predetermined) stopping point without engaging in avoidance, escape, or safety behaviors.

Two points stand out from this foundational work: first, early theories of exposure used a behaviorist black box approach to fear and anxiety. In this framework, any fears result from conditioning by the external environment: anxiety, whether adaptive or not, reflects the presence of learned threat, maladaptive anxiety results from excessive learned threat, and so repeated exposures correct this excessive threat. This view does not account for individual differences or internal, cognitive factors bearing on how information from the environment is processed. Although theories of anxiety and exposure may include internal, cognitive processes (Foa & Kozak, 1986; Lovibond et al., 2008), acknowledge the role of proximal and distal risk factors conferring individual differences (Zinbarg et al., 2022), and note that these individual differences likely influence learning (Craske et al., 2014), the assumption remains that exposure therapy works through normative-learning processes. Based on this assumption, reducing maladaptive fear and avoidance in pathological anxiety is equivalent to reducing experimentally conditioned fear and avoidance in the laboratory.

Second, despite the clear and ongoing inspiration from learning principles, developing exposure as an effective treatment required much trial and error. Jones’ work reviews several approaches that she tried with different children, all well informed by learning principles, and notes several that were unsuccessful and even harmful (Jones, 1924a). Similarly, Wolpe’s description of systematic desensitization is clearly informed by testing different approaches with his patients (Wolpe, 1961). Exposure therapy, therefore, did not neatly derive from learning theory; rather, it is the result of extensive experimentation and refinement. Why was behaviorist learning theory alone insufficient for developing an effective treatment? As we will review in the next paragraphs, this trial-and-error development revealed that many approaches that should work according to basic conditioning theories alone are not effective, while the approaches that proved to be effective require explanation beyond that provided by basic learning explanations.

What is missing from our understanding of exposure?

Exposure therapy techniques underwent several empirical modifications after their initial development. For example, relaxation training was central to systematic desensitization and other early exposure formulations (Wolpe, 1961); relaxation was thought to serve as a competing response to inhibit and extinguish learned associations between anxiogenic stimuli and autonomic arousal. Yet, empirical benefits of relaxation training are limited and experts now recommend against this component (Abramowitz, 2013; Rothbaum & Schwartz, 2002). Another major challenge to emotional processing theory in particular involved the time course of fear reduction. Emotional processing theory makes a specific prediction that incorporating corrective information into fear structures leads to fear reduction both within and between each exposure session. However, only between-session fear reduction predicts treatment outcome, necessitating a reconceptualization of when and how symptom change relates to learning (reviewed in Asnaani et al., 2016; Craske et al., 2008).

Attempts to augment normative learning processes during exposure therapy with learning-informed pharmacological, somatic, or behavioral interventions have not been particularly successful. D-cycloserine, a partial NMDA receptor antagonist with effects on fear conditioning, shows mixed results when used during exposure sessions; the number and timing of doses may moderate response (Hofmann, 2014; Mataix-Cols et al., 2017; Norberg et al., 2008), but targeted, preregistered tests of proposed moderators failed to demonstrate an augmenting effect (Smits et al., 2020). Transcranial magnetic stimulation (TMS) or transcranial direct current stimulation (tDCS) to increase or decrease activity in learning-related cortical regions during exposure have not been consistently effective (Bulteau et al., 2022; Cobb et al., 2021; Isserles et al., 2021; Zaizar et al., 2021). Behavioral approaches that normatively enhance extinction, such as novelty-facilitated extinction (Dunsmoor et al., 2015), do not improve exposure therapy outcomes in people with anxiety (Steinman et al., 2022). These translational failures occurred despite robust preclinical findings: these approaches consistently change normative learning, yet do not improve treatment outcomes. The disappointing outcomes of these exposure augmentation strategies have several possible explanations, but they suggest that the learning processes responsible for symptom improvement are not successfully modified by strategies that augment normative learning.

Current theories of exposure also do not account for important treatment components. The inability to overcome avoidance and engage in exposures, especially during initial exposure sessions, is a major barrier to treatment response. Empirically, higher pretreatment avoidance predicts worse response to treatment over and above symptom severity, reductions in avoidance predate other symptom changes during exposure therapy, and avoidance is significantly reduced with successful treatment (Cottraux et al., 1993, 2001; Hansmeier et al., 2021; Salcioğlu et al., 2007; Wheaton et al., 2018). In practice, initial exposure sessions focus more on reducing avoidance than on changing expectations of feared outcomes. Yet, current theories view avoidance as a byproduct of excessive fear, not a target of treatment itself. Avoidance is instead targeted outside of exposure through psychoeducation or motivational techniques. Manuals for exposure therapy practitioners (Foa et al., 2012) emphasize the need to target the patient’s idiographic “core fear,” a notion found in current theories of exposure. For example, two people with cleanliness-related obsessive-compulsive disorder (OCD) will require different approaches to exposure if one person’s core fear is harm coming to their loved ones from germs and the other’s is losing control of their environment. Successful exposures targeting a core fear will show generalization to other feared situations not directly targeted (such as refraining from cleaning at home generalizing to being able to tolerate messes in one’s child’s classroom, to use the first example). In contrast, exposures that do not target the core fear, despite similar content and patient fear ratings (such as refraining from cleaning a break room at work that the person’s family will never visit), show less generalization and overall symptom reduction. Yet, the necessity of targeting the core fear—over and above tailoring exposures based on patients’ fear ratings—is not addressed by current theories of exposure.

More recently, researchers have begun to tackle some of these problems and have proposed new accounts of learning processes in exposure that account for the need to focus on core fears, as well as a renewed focus on avoidance. Inhibitory learning theory moves beyond a basic focus on extinction and proposes that learning deficits—specifically, failure to inhibit fear memories and forming new, safe contexts—underlie the development of anxiety and poor response to exposure therapy (Craske et al., 2014). Problems separating dangerous and safe contexts also support the need to focus on core fears that share a context, rather than unrelated fears that span contexts. The central role of avoidance, and possible pathways to understand avoidance in anxiety, also have been the focus of recent reviews and theoretical papers (Krypotos, 2015; Pittig et al., 2018, 2020). However, these inconsistencies are yet to be fully integrated into theories of what goes awry in anxiety and how treatments, such as exposure therapy, work.

Uncertainty and anxiety

Current ideas about the role of uncertainty in anxiety

If, as proposed in earlier models, anxiety disorders are disorders of excessive learned threat, people with pathological anxiety should show increased threat conditioning and valuation of negative stimuli in the laboratory. People with anxiety do show behavioral, psychophysiological, and neural differences with paradigms, such as fear conditioning, safety learning, and extinction learning; however, these learning differences are not consistent with the idea of greater aversive conditioning in anxiety. Rather, in threat learning, people with anxiety disorders show elevated responding to safety signals (CS−) during conditioning, greater responses to the cue previously associated with threat (CS+E) during extinction (Duits et al., 2015), and greater generalization of learned threat (Cooper et al., 2022; Fraunfelter et al., 2022), accompanied by reduced ventromedial prefrontal cortex (vmPFC) activation (Marin et al., 2017, 2020). Outside of learning paradigms, people with fear-based disorders, such as panic disorder and posttraumatic stress disorder, show elevated psychophysiological responses to uncertain threat (Gorka et al., 2017; Grillon et al., 2008, 2009; McTeague & Lang, 2012a).

To account for these findings, Grupe and Nitschke (2013) proposed the uncertainty and anticipation model of anxiety (UAMA), where anxiety is characterized by excessive anticipation of uncertain threat. UAMA delineates five areas of dysfunction: inflated estimates of threat costs and probability, hypervigilance, deficient safety learning, behavioral and cognitive avoidance, and heightened reactivity to uncertainty; these dysfunctions relate (in both overlapping and distinct ways) to altered functioning of anterior mid-cingulate cortex, vmPFC, orbitofrontal cortex (OFC), insula, amygdala, dorsomedial and dorsolateral prefrontal cortex, bed nucleus of the stria terminalis (BNST), and periaqueductal gray (PAG), among other areas. Similar to UAMA’s emphasis on uncertain threat, Lissek, Pine, and Grillon’s (Lissek et al., 2006) analysis of ‘”strong” versus “weak” situations (Mischel, 1977) in anxiety emphasizes the role of uncertainty. People with clinical anxiety show intact responding to strong situations with clear threats but show elevated physiological responses to weak situations with greater uncertainty about negative outcomes. According to another theory of intolerance of uncertainty in anxiety, anxiety is due to difficulties tolerating the distress of uncertainty or insufficient information (Carleton, 2016). Crucially, these theories do not propose that threat processing overall is disrupted in anxiety nor that anxiety results from representing all negative outcomes as worse or more likely. Instead, they propose that anxiety is specifically related to difficulties with threat only when it is uncertain.

Precisely defining uncertainty using computational process models

Extant theories converge on uncertainty-related dysfunctions underlying behavioral, physiological, neural, and self-report impairments in pathological anxiety. Still unclear, though, is how these different dysfunctions relate to each other and what generates them: for example, is deficient safety learning due to the same underlying dysfunction as excessive behavioral and cognitive avoidance, or are these separate dysfunctions that co-occur in anxiety? Whether separate or shared, what processes lead to these observed disruptions? In addition, commonly used self-report measures, such as the Intolerance of Uncertainty Scale (Buhr & Dugas, 2002) are vague about the source of uncertainty-related issues. If someone endorses an item, such as “uncertainty makes me uneasy, anxious, or stressed,” does that mean that uncertain threat probabilities cause them anxiety, that they value negative outcomes as more aversive, that they require greater control over their environment, that they have less emotion-regulation abilities, or some combination of the above? Similarly, fear conditioning and extinction studies can demonstrate that people with anxiety differ in responses to safe and threat-related stimuli but cannot explain whether these findings are from elevated responding to all cues in a threat-associated environment, difficulty distinguishing safe from threat-related stimuli, or other learning differences. To better understand how uncertainty relates to anxiety pathology, these constructs must be precisely defined and distinguished from each other. Computational models can mathematically define learning and choices processes affected by uncertainty and disambiguate among disruptions in components of these processes (Krypotos et al., 2020; Levy & Schiller, 2021; Montague et al., 2012).

When navigating the world, organisms must integrate and update information about relationships between stimuli and outcomes (learning) and select stimuli based on their possible outcomes (choice). Three primary families of computational models describe these learning and choice processes: descriptive models of choice under ambiguity and risk (Platt & Huettel, 2008), generative sequential sampling models of choice that represent evidence accumulation during decision processes (Forstmann et al., 2016; Ratcliff & McKoon, 2008), and reinforcement learning and similar error-driven value update models of learning (Niv, 2009; Sutton & Barto, 1998; Vilares & Kording, 2011). The first set of models describes choice during “decisions from description” (Hertwig et al., 2004), or how people choose when values and probabilities are explicitly presented to participants. These models are descriptive rather than generative: they can decompose choices into components but cannot describe how these choices are generated. The second set of models, sequential sampling models, are generative models of choice and can describe decisions from either described or learned values and probabilities; however, as these models have not been applied to decisions involving uncertainty in people with anxiety (see Aylward et al., 2020; Price et al., 2019; White et al., 2010, 2016 for nonuncertainty-related studies in anxiety), we will not focus on these models in this review. The third set of models, reinforcement learning and other error-driven value update models, are generative and describe learning and decision-making processes when information about value and probabilities must be learned (“decisions from experience”) rather than described.

These learning and choice models allow us to quantify different types of uncertainty (Bach & Dolan, 2012; Bland & Schaefer, 2012; Nassar et al., 2012; Payzan-LeNestour & Bossaerts, 2011; Yu & Dayan, 2005; Table 1; Fig. 1A). Definitions and distinctions among types of uncertainty vary, but one consistent distinction is between irreducible and reducible uncertainty. Irreducible uncertainty (also called known uncertainty or risk) arises from noisy relationships between stimuli and outcomes (e.g., from a coin flip); it can be precisely described but not reduced. In contrast, reducible uncertainty arises from incomplete knowledge of contingencies between stimuli and outcomes and, as the name implies, can be reduced when learning is possible. Reducible uncertainty can be divided into two types, depending on whether the uncertainty is about contingencies given a stable relationship between stimuli and outcomes (estimation uncertainty) or if it stems from uncertainty about which environment is currently governing stimulus-outcome contingencies or the stability of those contingencies (unexpected uncertainty; also termed changepoint probability or volatility; Nassar et al., 2012; Payzan-LeNestour & Bossaerts, 2011; Piray & Daw, 2021). See Fig. 1B for a representation of irreducible uncertainty, estimation uncertainty, and unexpected uncertainty during a reversal learning task. With high volatility or frequent changepoints causing high unexpected uncertainty, normative learning should show high learning rates, particularly after poorly predicted outcomes, as represented by the Pearce-Hall model of learning (Pearce & Hall, 1980). Meanwhile, with high noise causing high irreducible uncertainty, normative learning should slowly incorporate poorly predicted outcomes with a low learning rate, as represented by the Mackintosh model of learning (Mackintosh, 1975; Piray & Daw, 2021).

Table 1 Key terms
Fig. 1
figure 1

Types of uncertainty. A. Relationship among uncertainty types. B. Schematic of levels of types of uncertainty during a reversal learning task. Initially, outcome 1 is reinforced more than outcome 2 (reinforcement represented by dots in top part of figure, where x-axis indicates trials). Halfway through, these contingencies reverse, and a second reversal occurs ~75% of the way through. Approximate values of each type of uncertainty at different points in the task are represented in the lower part of the figure. Irreducible uncertainty is initially high before converging on a value close to the true noise level; this form of uncertainty only increases slightly after the changepoints. Estimation uncertainty is initially high, reflecting uncertainty about contingencies. This uncertainty reduces while contingencies are learned but then increases after changepoints, reflecting the need to relearn contingencies once they have changed. As changepoints become more frequent and unexpected uncertainty increases, corresponding increases in learning rate lead to faster reductions in estimation uncertainty between changepoints. Unexpected uncertainty is also initially high and decreases while contingencies are stable. After each changepoint, this uncertainty increases and stays higher, reflecting the increased possibility of additional changepoints

The relationship between anxiety and these types of uncertainty can be illustrated with an example of someone who fears health problems during a panic attack (outcome) when navigating a crowded mall (stimulus). Irreducible uncertainty refers to how reliably the crowded mall causes a panic attack; it will be high if being in a mall is a poor predictor of a panic attack and low if being in a mall is a definite predictor of having (or not having) a panic attack. Estimation uncertainty represents what is yet to be learned about the probability of the mall causing a panic attack, given the current set of contingencies; it will be high if more experiences being in the mall will lead to a better ability to predict the likelihood of panic attacks, and low if this contingency is already learned as accurately as possible. Unexpected uncertainty refers to the probability that the relationship between being in the mall and having a panic attack has changed. It will be high if the relationship between being in a mall and having a panic attack may have changed (e.g., after a recent panic attack in a similarly crowded restaurant, which may indicate that panic attacks in crowded areas are now more likely) and lower if the relationship between being in a mall and having a panic attack is stable.

Uncertainty engages brain regions involved in salience, value, and cognitive control (for in-depth reviews of neural correlates of uncertainty processing, see Bach & Dolan, 2012; Soltani & Izquierdo, 2019). The dorsal anterior cingulate cortex (dACC) forms the hub of the salience network (Seeley et al., 2007) and is required to adaptively adjust behavior after changepoints (Behrens et al., 2007; Hayden et al., 2011; Kennerley et al., 2006; Nassar et al., 2019; O’Reilly et al., 2013). The dACC and other noradrenergically innervated components of the salience network, including amygdala and insula, respond to surprising information, although whether they selectively signal one form of uncertainty over another is unclear (Bach & Dolan, 2012; Kuhnen & Knutson, 2005; Li et al., 2011; Soltani & Izquierdo, 2019). Serotonin signals from the dorsal raphe nucleus also communicate information about uncertainty (Grossman et al., 2022; Matias et al., 2017). Meanwhile, orbitofrontal, ventromedial, and ventrolateral prefrontal cortex and hippocampus interact with these uncertainty-activated areas to communicate information about context and expected value (Bartra et al., 2013; Schuck et al., 2016; Schuck & Niv, 2019; Walton et al., 2011)..

A key advantage of these computational models is that these types of uncertainty have precise and explicit mathematical definitions. For example, the irreducible uncertainty of a stimulus with a perfectly learned 75%/25% contingency can be represented by the variance of a Bernoulli distribution with a 75% probability of an outcome. This value can be used to quantitatively assess the effects of different levels of uncertainty on behavioral or physiological signals.

Choice models of decisions from description and findings in anxiety

Choice models of decisions from description are based on prospect theory and related behavioral economic theories (Kahneman & Tversky, 1979). These models measure whether, independent of value, the effects that risk (irreducible uncertainty), ambiguity (similar to reducible uncertainty, but pertains to situations where uncertainty cannot be reduced through learning), and valence (loss versus gain) have on whether an option is chosen. In these models, the probability of choosing each option is based on its expected value (magnitude of each outcome times its probability); risk aversion indicates a reduced likelihood of choosing options with greater risk, accounting for expected value; similarly, ambiguity aversion indicates a reduced likelihood of choosing options with greater ambiguity accounting for expected value, while loss aversion indicates a reduced likelihood to choose options with negative values, accounting for overall expected value. Choices involving risk, ambiguity, and loss engage value-sensitive brain areas such as OFC, vmPFC, and nucleus accumbens as well as salience-related areas like amygdala and insula (Bartra et al., 2013; De Martino et al., 2006; Hsu et al., 2005; Kuhnen & Knutson, 2005; Tom et al., 2007). These paradigms and models do not involve learning. As a result, they do not represent many real-life decisions but are useful for determining alterations in choice behavior when values and probabilities are explicitly presented.

Studies of risk, ambiguity, and loss aversion in anxiety point to increased risk and ambiguity aversion, particularly with losses, but no differences in loss aversion (Charpentier et al., 2017; Giorgetta et al., 2012; Hartley & Phelps, 2012; Maner et al., 2007, but see Sip et al., 2018). This pattern of decision-making differences indicates that people with high anxiety do not avoid choices with potential negative outcomes more than people with less anxiety, but they avoid choices where such outcomes are less predictable due to greater risk or ambiguity. These findings are consistent with intact valuation and decision-making with negative outcomes accompanied by difficulties deciding among uncertain, potentially negative outcomes. Risk and ambiguity aversion may also relate to greater psychophysiological responses for uncertain, but not certain, threat in anxiety (Gorka et al., 2017; Grillon et al., 2008, 2009). These findings suggest that anxiety is related to problems in choice processes. However, most real-world decisions require learning about values and probabilities, which in turn affect choices based on these experiences (Hertwig et al., 2004). Understanding how choice processes interact with potential learning differences in anxiety requires a fuller understanding of how learning and choice interact and go awry in anxiety.

Reinforcement learning studies of anxiety

Process-based models of learning can fill the explanation gap left by descriptive models of choice by illustrating how value and uncertainty are learned and used in decisions. Reinforcement learning conceptualizes value learning as driven by prediction errors from comparing received to expected values (Rescorla & Wagner, 1972; Schultz et al., 1997). Prediction errors are then used to update future expectations of value; a learning rate parameter in the model controls how rapidly these updates occur. In the simplest neural account of reinforcement learning, dopaminergic projections from the ventral tegmental area to striatum, particularly ventral striatum, communicate prediction error signals while ventromedial prefrontal cortex tracks value (Bartra et al., 2013; Diederen et al., 2016; Montague et al., 1996; Rangel et al., 2008). To track complex environments more representative of the real world, basic reinforcement learning models require modification. Specifically, they must track dynamic changes in task contingencies as well as different types of uncertainty that can help discern change points from other sources of variability. Behaviorally, these models incorporate effects of uncertainty on dynamic changes in learning rate.

Models tracking uncertainty during learning have shown the most consistent learning alterations in anxiety. Findings using these models are summarized in Table 2 and show several commonalities (of note, the included studies were based on the authors’ knowledge of the literature and augmented by PubMed and Google Scholar searches using the terms “reinforcement learning AND (anxiety OR PTSD OR OCD)” but do not represent a comprehensive, structured literature review following PRISMA guidelines).

Table 2 Computational studies of disrupted learning from uncertainty in anxiety

First, these studies find differences in parameters representing learning rate adjustments (including associability, or the amount learning rates adjust based on past absolute prediction errors associated with a stimulus; Le Pelley, 2004; Pearce & Hall, 1980), latent state formation (tendency to attribute outcomes to one cause versus another), and other learning measures dealing with uncertainty. Other parameters measuring overall learning rate, choice consistency, and outcome valuation show little relationship with anxiety. Second, studies with reward and loss learning conditions have generally found differences during loss learning only (Brown et al., 2018; Browning et al., 2015), although some have found differences in both conditions (Gagne et al., 2020).

Lastly, there may be some diagnostic specificity among internalizing disorders. Internalizing disorders comprise depression, anxiety, and related disorders (including PTSD and OCD) and can be further divided into fear- versus distress-based disorders and symptoms (Craske et al., 2009; Watson, 2005). Specifically, some studies have found learning differences that were related to fear, and not distress-based, disorders and symptoms (Brown et al., 2018; Norbury et al., 2021). In contrast, a study comparing GAD and MDD did not find specificity to symptoms of one of these disorders (Gagne et al., 2020). One explanation may be that fear-related symptoms and disorders are more related to these learning differences than distress-related disorders (Clark & Watson, 1991; McTeague & Lang, 2012b).

Other learning differences in the studies reviewed in Table 2 indicate differences in learning from good versus bad outcomes, but this learning impairment also appears in studies of depression (Abend et al., 2022; Brown et al., 2021; Gagne et al., 2020; Lamba et al., 2020; Wise & Dolan, 2020). The direction of results in individual studies varies; a recent computational meta-analysis suggested that anxiety and depression show greater punishment learning rates (Pike & Robinson, 2022). However, investigating the specificity of uncertainty-related learning disruptions to fear-based symptoms and disorders requires further studies in samples with a broad range of fear and distress-related symptoms and more precise measures of these symptom dimensions, rather than relying on measures such trait anxiety that measure internalizing (spanning depression and anxiety) symptoms more generally (Knowles & Olatunji, 2020).

Anxiety as maladaptive aversive uncertainty learning

Given the task design and model formulation differences in the studies reviewed above, what consistent learning differences are present in anxiety? Most of the above studies manipulated unexpected or irreducible uncertainty, but not both. Adaptive learning requires simultaneously assessing the level of different types of uncertainty in one’s environment and adjusting learning accordingly. In tasks or blocks with high unexpected uncertainty (20 or fewer trials per block; Brown et al., 2018; Zika et al., 2022 and volatile blocks in Browning et al., 2015; Gagne et al., 2020), people with anxiety show slightly slower learning overall, but accelerate learning more after very surprising outcomes that indicate obvious changes. Meanwhile, in tasks or conditions with high irreducible uncertainty (75%/25% contingency or less), people with anxiety show a generally higher learning rate, particularly after surprising losses (Homan et al., 2019; stable blocks in Browning et al., 2015; Gagne et al., 2020)—although this effect is less consistent (Norbury et al., 2021; Zika et al., 2022). These patterns suggest that people with anxiety have an impaired ability to discern whether prediction errors result from irreducible uncertainty, requiring a slower learning rate, or true changes in contingencies, requiring a higher learning rate. As a result, they show lower learning rates when learning rates should be high from high unexpected uncertainty and higher learning rates when learning should be slow from noisy outcomes. Other proposed explanations for this pattern of results include that people with anxiety may infer more information content from negative outcomes (Pulcu & Browning, 2019) or show specific reductions in estimated irreducible uncertainty (Piray & Daw, 2021). Crucially, despite inconsistent results from individual studies, using a common theory and family of generative models enables synthesis across studies to propose and test a shared impairment.

Maladaptive uncertainty learning can explain many noncomputational findings. Failure to discern between irreducible and unexpected uncertainty can lead to difficulties determining when contexts have changed, potentially explaining fear extinction and retention impairments commonly seen in anxiety (Duits et al., 2015; Marin et al., 2017). Within a context, difficulties estimating the amount of irreducible uncertainty present may impair detection of consistently safe stimuli, as is found with reduced safety signal learning (Duits et al., 2015), and discerning differences in contingencies between similar stimuli, as seen with overgeneralization of conditioned fear (Cooper et al., 2022; Fraunfelter et al., 2022). In terms of neural function, anxiety, and particularly panic and other fear-based disorders, is linked to impaired noradrenergic function (Bremner et al., 1996; Charney, 1984; Hendrickson & Raskind, 2016; Kalk et al., 2011; Naegeli et al., 2018). Disrupted noradrenergic signaling leads to excessive activation of the salience network (Etkin & Wager, 2007; Sylvester et al., 2012), impairing learning about uncertainty. Altered activation of regions that control uncertainty signals, including prefrontal cortex and hippocampus, in anxiety (Aupperle & Paulus, 2010; Marin et al., 2017, 2020; Sun et al., 2020) lead to impaired regulation of these uncertainty signals.

Avoidance and anxiety

Current theories of avoidance in anxiety

Avoidance of feared situations is a disabling aspect of anxiety and the primary target in exposure therapy and other treatments. Why anxious people show persistent avoidance in the absence of outcomes is difficult to explain from the traditional behaviorist perspective (Richter et al., 2017). How is a (lack of) behavior maintained in the absence of any reinforcement? One influential early account of avoidance was Mowrer’s two-factor theory (Mowrer, 1951). It proposes that in the first, Pavlovian stage, aversive unconditioned stimuli cause the conditioned stimulus to be associated with fear; avoidance behavior is then reinforced by reductions in learned fear through instrumental learning. For example, if a person is bitten by a dog in a dog park, the dog park becomes associated with fear (stage one: Pavlovian conditioning). If the person then later approaches the dog park, their fear increases, and leaving the area of the dog park—avoidance—is reinforced by a reduction in this fear (stage two: instrumental learning). Despite providing an intuitive account of avoidance that is still employed clinically, two-factor theory had several inconsistencies (Rachman, 1976; Solomon & Wynne, 1954): among them, conditioned avoidance is remarkably resistant to extinction, continues even if it does not terminate the conditioned stimulus, and persists in the absence of fear.

After the demise of two-factor theory, research on instrumental avoidance learning waned and studies of learning in anxiety shifted to simpler Pavlovian fear conditioning paradigms (LeDoux et al., 2017). Some work on instrumental theories continued, notably resulting in Lovibond’s cognitive expectancy model (Lovibond et al., 2008). This model, like two-factor theory, proposes that the relationship between an aversive outcome and a conditioned stimulus is acquired through Pavlovian conditioning while avoidance of the conditioned stimulus is learned through instrumental learning. However, what is learned in the first, Pavlovian, stage is the expectancy of a negative outcome, not fear. During the second stage the organism learns which actions lead to the conditioned stimulus and which actions lead to safer outcomes with a lower expectation of the negative outcome. This avoidance of the expectancy of negative outcomes is processed as the removal of an aversive stimulus and so is reinforcing. Reframing learning in terms of expectancies, rather than drive reduction, addresses many of the difficulties that two-factor theory encountered. For example, expectancies maintain responses during extinction and in the absence of fear. However, the abstract concept of expectancy, with its roots in cognitive models, has been difficult to integrate into behavioral models of threat learning and avoidance.

Other recent theories of avoidance abandoned explanations of threat avoidance as an instrumental phenomenon. Drawing on nonhuman studies, LeDoux and colleagues (LeDoux et al., 2017) proposed that avoidance behavior can be broken down into three, potentially sequential, processes: Pavlovian conditioning leading to specifies-specific defensive responses (Bolles, 1970), such as withdrawal or freezing controlled by a lateral amygdala—central amygdala—periaqueductal gray circuit; goal-directed avoidance responses triggered by infralimbic prefrontal cortex (analogue of the human vmPFC) and controlled by a lateral amygdala—basal amygdala—nucleus accumbens circuit; and dorsal striatum-based habitual avoidance responses. In contrast to goal-directed avoidance behaviors, which are outcome-sensitive (e.g., “I avoid the mall, because I tend to get panic attacks there”), habitual responses are stimulus-triggered and outcome-insensitive (e.g., “I avoid large crowds”). LeDoux and colleagues refer to the concept of active versus passive coping to explain how avoidance goes awry in anxiety: passive coping, which they relate to maladaptive avoidance, is characterized by maladaptive Pavlovian defensive responses like withdrawal. Active coping can be more adaptive but can become pathological when habitual responses predominate. Therefore, LeDoux and colleagues propose that maladaptive avoidance in anxiety is caused by either excessive Pavlovian defensive responses or excessive habitual responses rather than adaptive goal-directed responses.

In a more clinically focused review, Arnaudova et al. (2017) suggested several other reasons for maladaptive avoidance in anxiety outside of instrumental learning: increased threat appraisal, increased automatic avoidance tendencies, decreased regulation of avoidance in the service of other goals, habitual avoidance responding from overtraining, and increasing psychological distance through experiential avoidance. As with models of uncertainty in anxiety, though, the authors do not offer a unifying framework explaining how these processes relate to each other and disruptions in anxiety. In addition, some of these explanations (increased threat appraisal, experiential avoidance, and decreased regulation of avoidance in service of other goals) can indeed be conceptualized with updated instrumental learning models (Huys & Renz, 2017; Lovibond et al., 2008). The other explanations, increased automatic avoidance tendencies and habitual avoidance responding, are similar to the processes proposed by LeDoux and colleagues; these are reviewed in the next paragraph.

Excessive Pavlovian withdrawal in the presence of aversive stimuli in anxiety has some evidence in the literature (Grillon et al., 2017; Loijen et al., 2020; Mkrtchian et al., 2017; but see Struijs et al., 2017). These studies do not show specificity to anxiety; instead, increased inhibition may be a feature of all internalizing disorders. Therefore, people with anxiety and other internalizing disorders may engage in excessive withdrawal with negative stimuli, manifesting as increased passive avoidance, as theorized by Arnaudova and colleagues (Arnaudova et al., 2017; increased automatic avoidance tendencies) and LeDoux and colleagues (LeDoux et al., 2017; maladaptive Pavlovian defensive responses). The evidence for habit learning abnormalities in anxiety is less clear. Studies with large, transdiagnostic samples show that increased habit learning, as measured by reduced model-based planning, is due to compulsive symptoms, not anxiety (Gillan et al., 2016, 2020, 2021; Heller et al., 2018), and people with anxiety actually show greater model-based planning in some contexts (Hunter et al., 2021). In addition, true habitual avoidance in humans may be much more limited than in animal studies of active avoidance (de Wit et al., 2018); studies finding greater habitual learning may instead be picking up on reduced model-based learning in compulsivity. Regardless, maladaptive avoidance in anxiety does not seem to stem from an increased tendency to form habits.

Other relevant theories of defensive behaviors, including avoidance, come from ethology and have not been empirically tested in anxiety. One idea is that behavior depends on the imminence of predator threat (Fanselow, 1994; Levy & Schiller, 2021; Mobbs et al., 2015, 2020): animals behave very differently when safe (predator threat unlikely), under pre-encounter threat (threat not present but more likely), post-encounter threat (threat present but not attacking), and circa-strike threat (predator attacking). Active avoidance behaviors are most likely during pre-encounter threat, where model-based planning to avoid greater threat is helpful, whereas passive avoidance and stereotyped defensive behaviors become more likely as the threat becomes imminent, during post-encounter or circa-strike threat (Mobbs et al., 2020). Threat imminence theories have not incorporated effects of uncertainty on defensive behavior; related work on safety proposes that certainty about safety, the opposite of uncertainty about threat, reduces defensive behaviors (Tashjian et al., 2021), whereas greater uncertainty about the source of threat in anxiety may increase perceived threat imminence (Raymond et al., 2017).

In summary, although Mowrer’s two-factor theory informed exposure therapy, it fails to explain many behaviors. Subsequent writers offered explanations for a broader range of avoidance behaviors, but these explanations required a departure from behaviorist specifications, impeding integration with current reinforcement learning theories. Accounts of avoidance that do not consider instrumental learning cannot explain instrumental behaviors that comprise active avoidance or else postulate behavioral tendencies, such as increased reliance on habitual behavior, that are inconsistent with empirical findings in anxiety. Ethological ideas about threat imminence help to understand situations where avoidance behavior arises but do not address the role of uncertainty and have yet to be applied to anxiety. In the following sections, we consider how reinforcement learning can formalize some of these ideas, such as cognitive expectancies and a spectrum of defensive behaviors, and advance a process account of uncertainty learning to explain avoidance in anxiety.

The explore-exploit dilemma and avoidance

Avoidance involves choices or actions, which are absent in Pavlovian conditioning paradigms primarily used to study anxiety. Choices an organism or agent makes about which stimuli they encounter and learn from affects what they know about their world. Uncertainty about the outcomes of available choices gives rise to the explore-exploit dilemma (Kaelbling et al., 1996; Sutton & Barto, 1998). This dilemma is whether one should choose certain, high value options to maximize short-term rewards (exploitation) or sample alternatives with lower or unknown values in the hope of discovering superior options and maximizing long-term rewards (exploration). Neurally, components of the frontoparietal control network as well as striatum and amygdala show activation to uncertainty when explore/exploit decisions are required (Blanchard & Gershman, 2018; Hogeveen et al., 2022).

The relative benefits of exploitation versus exploration depend on the type of uncertainty (Cohen et al., 2007). Exploring to reduce uncertainty can maximize value in the long run if estimation uncertainty predominates, such as at the beginning of a block of trials (Rich & Gureckis, 2018; Wilson et al., 2014, but see Payzan-LeNestour & Bossaerts, 2012). With high estimation uncertainty and several chances to maximize value, initial exploration reduces uncertainty, leading in turn to better knowledge of contingencies and choices resulting in higher-valued outcomes on later trials. Exploration also increases with high unexpected uncertainty if it helps detect possible change points (Navarro et al., 2016). In contrast, if uncertainty is primarily irreducible, exploration will not reduce uncertainty and so will not improve contingency knowledge for later choices. Exploration also is unhelpful in single-shot learning or brief episodes where too few future trials exist to benefit from reducing uncertainty (Rich & Gureckis, 2018; Wilson et al., 2014). This effect of differences in uncertainty on exploration versus exploitation is similar to uncertainty’s effect on learning rates: high relative estimation uncertainty leads to high learning rates and exploration, whereas high relative irreducible uncertainty leads to low learning rates and exploitation. The effect of unexpected uncertainty is unclear; because it leads to high learning rates, it may cause more exploration, or it may reduce exploration if uncertainty about contingencies possibly changing makes learning more about the current environment unhelpful.

These relationships between uncertainty and exploration apply to environments with rewarding outcomes. In aversive contexts, people show a different pattern of behavior: when choosing among loss outcomes, humans and other organisms seem to shift their goal from maximizing long-term value to avoiding negative outcomes as much as possible in the short term. In environments where all outcomes are negative, more uncertain outcomes offer the prospect of potential safety. This search for safety causes people to become more uncertainty seeking overall, even during single-shot learning (Krueger et al., 2017; Lejarraga & Hertwig, 2017). People also explore high estimation uncertainty options less when some options lead to catastrophic failure (Schulz et al., 2018). People may overexplore when uncertainty cannot be reduced or controlled, as with irreducible uncertainty, and underexplore to stay in safe areas when uncertainty can be controlled, as with estimation uncertainty. The effect of unexpected uncertainty on exploration in aversive contexts has yet to be investigated, but, like estimation uncertainty, it may decrease exploration, especially if the option to avoid making any potentially dangerous choice is available.

This myopic focus on minimizing short-term losses in aversive environments appears counterproductive but could improve survival in real-world environments where a small number of negative outcomes could cause injury, illness, or death and foreclose the chance of any future choices (Bateson, 2002; Korn & Bach, 2015; Mehlhorn et al., 2015). If anxiety is related to maladaptive aversive uncertainty learning, avoidance would result from increased exploitation and decreased exploration from misperception of uncertainty. However, fully formulating this effect requires an understanding of how instrumental avoidance relates to reinforcement learning.

Temporal difference Markov decision process models of avoidance

Lovibond’s cognitive expectancy model was formulated as “cognitive,” because a behaviorist framework could not accommodate the concept of internal expectancies as a driver of behavior. However, computational theories of temporal difference reinforcement learning (Sutton & Barto, 1987, 1998) are able to capture similar processes fully within a learning framework (Maia, 2010; Moutoussis et al., 2008; Seymour et al., 2004). To explain avoidance, these models must explain how previous actions and values influence later outcomes. The RL paradigms and models reviewed earlier can explain only single-step learning, where learning and choice behavior is influenced by immediate rewards and punishments. In RL, these are known as bandit problems. A more extended formulation of RL, known as Markov decision processes or MDPs, can explain how actions influence future outcomes as well.

MDP models of reinforcement learning describe learning as a multistep process, where a person (or more generally, an agent) travels through several states, some of which are associated with rewards (or losses; Fig. 2; Sutton & Barto, 1998). Often, an outcome is only experienced at the end of a trip, or episode, through the MDP. States with outcomes are termed terminal states, as the episode ends in one of these states and results in reward or punishment. The values of earlier actions and states are learned based their relationships to the outcomes obtained in terminal states. In an MDP, transitions between each set of states acquire values that depend on the probabilities of transitioning to future states in the MDP (transition probabilities) and the outcomes received when transitioning to a terminal state. Therefore, an MDP consists of states, rewards (in MDP terminology, rewards encompass all outcomes, including negative ones), actions describing what is done at each state, and transition probabilities between states.

Fig. 2
figure 2

Markov decision processes. A. MDP formulation of an instrumental learning paradigm or bandit task. Each trial has two stimuli: one with a 75% chance of leading to reward and a 25% chance of leading to no reward; the other stimulus has the opposite probabilities. This MDP has three states: the state prior to choice (s3), where the agent is presented with the two stimuli; two outcome, or terminal, states, s2 and s1; and two actions, choosing one stimulus (a1) and choosing the other (a2). Transitioning from state s3 to the two terminal states s1 and s2 have rewards of 1 and 0, respectively. There is a transition probability for each action-transition pair: Pa1(s3, s1) (the probability of transitioning from state s3 to state s1 given action a1) is 0.75, Pa1(s3, s2) is 0.25, Pa2(s3, s1) is 0.25, and Pa2(s3, s2) is 0.75; these probabilities re-express the 75%/25% and 25%/75% outcome contingencies for the stimuli. In this MDP, there is one choice: select either a1 or a2. B. MDP formulation of an instrumental learning paradigm with two steps to the outcome. Compared to the MDP in panel A, this MDP has another state (s4) with another set of stimuli to choose between that lead to reward, and a state (s5) where one chooses between stimuli that probabilistically lead to states s4 or s3. C. MDP of a set of possible states and actions for a person with social anxiety going to a party. This example illustrates a subset of potential states and actions (e.g., another possible state after the “make a joke” action is that the group laughs at your joke, but then you accidentally sneeze on someone, which also leads to the core fear of negative social evaluation). Note that there are three terminal states: staying in a conversation (person’s goal, small positive reward), returning home without engaging in conversation (total avoidance, no reward), and being excluded (feared outcome, large negative reward). D. Abstraction of the MDP in Panel C. E. First stage of Pavlovian learning as proposed by (Moutoussis et al., 2008) and (Maia, 2010). Values propagate from the terminal states to other states and actions through learning. Values inside each state (represented by boxes) represent the learned value of each state, whereas values next to arrows represent the change in value when transitioning between those states. F. Second stage of instrumental learning as proposed by (Moutoussis et al., 2008) and (Maia, 2010). Actions are taken with frequencies, denoted by line thickness, based on the values acquired through Pavlovian learning in the first stage

As an example of a very simple MDP, we can reconceptualize a trial from a basic instrumental learning task (Fig. 2A). In the simple MDP described in Fig. 2A, all state transition rewards are immediately experienced. In a larger MDP, some state transitions lack immediate reward and must be learned (Fig. 2B, which is similar to the design of the popular two-step task; Daw et al., 2011). How choices are made is called a policy in MDP terminology, which can be the choice processes discussed earlier or learned over time through reinforcement learning. In the (Moutoussis et al., 2008) and (Maia, 2010) models, the tendency to choose each action is slowly increased or decreased based on whether the resulting reward is better or worse than expected using an actor-critic architecture. In the two-step MDP in Fig. 2B, the transitions from state s5 to s4 and s5 to s3 have no immediate reward; instead, the values of these state transitions are learned through experience and depend on the combination of policy, actions, and other states’ immediate rewards.

According to temporal-difference theories of avoidance learning using MDPs, each component of avoidance behaviors is represented by states and actions (Figs. 2C and D). Initially, the unconditioned stimulus is encountered, resulting in a negatively valued reward. Through Pavlovian conditioning, the state transitions more likely to lead to the unconditioned stimulus acquire more negative values than state transitions with a low probability of doing so (Fig. 2E). Then, when choosing actions, the agent preferentially chooses higher-valued state transitions less likely to result in the negative outcome (Fig. 2F). In the model of (Moutoussis et al., 2008), comparing the outcome after successful avoidance to other possible outcomes (i.e., encountering the unconditioned stimulus) using advantage learning (Dayan & Balleine, 2002) results in a positive value for avoidance behaviors and serves as the instrumental reinforcer. In (Maia, 2010) the reinforcement comes from a reduction in negative values when transitioning into safer states. In both models, avoidance continues to be reinforcing even if the negative outcome is not experienced after initial conditioning.

It is difficult, however, for these temporal difference models of avoidance to explain avoidance in anxiety. One feature of anxious avoidance is that the feared outcome (in the example in Fig. 2, being socially shunned) is rarely or never encountered. The proposed models require this outcome to be experienced during initial Pavlovian conditioning. Human learning, however, can happen without direct experience. Internal simulations of possible outcomes, observational learning, and instructed learning all cause states to acquire negative values and to be avoided (Askew & Field, 2007; Dymond et al., 2012; Muris & Field, 2010; Olsson & Phelps, 2004, 2007; Rachman, 1977). For example, a person who fears a panic attack in a public place, because it may lead to a heart attack and death may have learned this feared outcome from: simulating a possible negative outcome of death when feeling faint; hearing about a person who felt faint and then had a heart attack; observing a parent avoiding feeling faint and inferring a negative outcome from the parent’s avoidance; or being told that chest tightness is a sign of a heart attack.

Another component of MDPs relevant to clinical avoidance, but not yet addressed in temporal difference MDP models of avoidance, is the role of cached sequences of states and actions, or options (Sutton et al., 1999). Options allow sequences of states and actions to be grouped and executed together. In scenarios where certain sequences occur often, such as the steps of cooking a meal, options simplify complex MDPs by reducing them into sets of sequences rather than evaluating each state and action individually. Humans show frequent and flexible use of options and similar forms of cached sequences (Botvinick et al., 2009; Huys et al., 2015; Xia & Collins, 2021). People may rely on options more with greater threat imminence, as increased threat imminence causes use of less cognitively demanding and more model-free forms of decision-making (Mobbs et al., 2020). Notably, greater use of options leads to inflexible behavior that disregards outcomes of intermediate steps, resembling habitual behavior. Unlike habitual behavior, options are goal-directed behaviors that can be interrupted if indicated. The tendency of people with anxiety to fall back on avoidance behavior when uncertainty is higher or in situations with greater stress may reflect a greater reliance on options with greater threat imminence.

Viewing avoidance as the outcome of temporal difference learning allows us to model differences in learning and choice explicitly and examine their relationships with avoidance behavior. The few existing studies on temporal difference MDP-like learning in anxiety provide some insight into how avoidance behavior may arise in anxiety. (Vervliet et al., 2017) measured self-reported positive emotion as a proxy for model-derived positive prediction errors during an avoidance task. Participants’ reported emotions showed similar patterns as prediction errors inferred from a temporal difference account of avoidance, but participants with less distress tolerance had less specific relief signals. They proposed that intolerance of distress, common to all internalizing disorders, may be linked to overgeneralized avoidance learning. Initial modeling simulations also show that modified policies representing pessimistic or catastrophizing choice processes lead to excessively negative valuation and avoidance as well as risk aversion (Gagne & Dayan, 2022; Zorowitz et al., 2020). This form of choice process, where avoiding negative outcomes is favored over maximizing reward, may better represent normative, negative outcome-avoiding behavior in aversive environments than anxiety, however. Other theoretical work has suggested that excessive active avoidance results from differences in policy, where even slightly negative values are not chosen, or from learning, where state transitions would take on more negative values (Raymond et al., 2017); however, these learning and choice alterations have not been empirically demonstrated in anxiety. To our knowledge, no work has yet connected the maladaptive uncertainty learning seen experimentally in anxiety to these models of avoidance.

Uncertainty-sensitive markov decision-process models and anxious avoidance

Examining how uncertainty shapes explore/exploit decisions with temporal difference MDP models can help us understand how anxious avoidance may result from maladaptive uncertainty learning in anxiety. Adding uncertainty to temporal difference MDP models means that choices depend not only on learned values but also, through explore/exploit effects, on uncertainty (Fig. 3). Increases in exploitation will increase selection of known, safe states, while increased exploration will decrease reducible uncertainty and increase certainty about transition probabilities, but also lead to terminal states with negative outcomes more often. Meanwhile, in an aversive environment, policies shift to minimize negative outcomes in the short term. This shift can be represented by discarding choices whose range of potential outcomes include more negative values, as in the risk avoiding policies reviewed above (Gagne & Dayan, 2022; Zorowitz et al., 2020). Therefore, normative behavior in uncertain, aversive MDPs will show avoidance of uncertain states from increased exploitation of safer states. This avoidance maintains high uncertainty and perpetuates avoidance by preventing exploration of more uncertain states. Additionally, increases in any form of uncertainty will reduce the distinction between early safe and threat-related stages, causing previously safe stages to be perceived as more dangerous.

Fig. 3
figure 3

Effects of uncertainty on explore/exploit decisions within a Markov decision process. Top, effect of each type of uncertainty on exploration in appetitive and aversive environments. In rewarding environments, irreducible uncertainty decreases exploration, while estimation uncertainty increases; effects are opposite in aversive environments. Effects of unexpected uncertainty have been less studied, so are qualified with ?, but appear to increase exploration in appetitive contexts and so may decrease exploration in aversive contexts. Bottom, possible levels of uncertainty under different circumstances and effects on exploration in aversive environments. As anxiety is hypothesized to primarily affect uncertainty estimation in aversive environments, explore-exploit behavior should show few differences with anxiety when learning about reward.

We propose that in anxiety, impaired learning about aversive uncertainty will increase the amount of estimation uncertainty (from impairments in detecting true change points) and unexpected uncertainty (from problems learning about irreducible uncertainty that cause change points to be over-inferred). Because these types of uncertainty normatively increase exploitation, greater estimation and unexpected uncertainty in anxiety will further increase exploitation of safer states and avoidance of uncertain states. Continued avoidance of uncertain states will maintain inappropriately elevated uncertainty estimates, ensuring continued avoidance.

Uncertainty about threat elicits distress and arousal above what is caused by threat itself (Bechara et al., 1997; de Berker et al., 2016) and causes uncertain states to be perceived as distressing. This distress increases with maladaptive increases in uncertainty, as commonly experienced by people with anxiety and may explain why states with higher uncertainty are perceived as aversive. Ethologically, uncertainty can modulate perceived threat imminence: one cannot rule out a worst-case scenario where the threat is near. Perceptions of greater threat imminence resulting from high uncertainty could shift defensive behaviors along the threat imminence spectrum toward more model-free or cached actions (Fanselow, 1994; Mobbs et al., 2015, 2020). This shift would promote passive avoidance behaviors, such as withdrawal from or freezing in states with uncertain threats. It also may lead to greater reliance on options and similar decision-making heuristics, resulting in inflexible avoidance behaviors. Neurally, reduced or altered frontoparietal network activation in the presence of altered uncertainty signals in anxiety (Grupe & Nitschke, 2013; Hauner et al., 2012; Sylvester et al., 2012) may underlie changes in explore/exploit decision-making.

Integrating maladaptive aversive uncertainty learning, avoidance, and exposure therapy

Exposure as remediating maladaptive uncertainty learning and resulting avoidance in anxiety

States high in uncertainty, particularly estimation and unexpected uncertainty, will be avoided in aversive environments. Exposure therapy encourages exploration of these avoided states. Through repeated exploration of states initially estimated to have high uncertainty, people with anxiety update uncertainty estimates closer to normative values (Fig. 4)—in contrast to the traditional account of exposure as extinction, where repeated encounters with feared stimuli normalize exaggerated learned threat. These reductions in uncertainty have further effects. First, states acquire more certainty about their likelihood of leading to feared versus safe outcomes. This certainty increases the value of states as their probability of leading to the feared outcome decreases. Second, the reduction in uncertainty itself increases exploration relative to exploitation, encouraging further experiences with previously avoided states. This increase in exploration is the opposite of the feedback cycle caused by avoidance—while avoidance maintains incorrect uncertainty estimates, in turn perpetuating avoidance, increased exploration remediates and reduces uncertainty estimates which in turn favors more exploration. Meanwhile, decreases in uncertainty reduce negative emotions, by reducing uncertainty-related distress, and enable use of more flexible decision-making strategies through decreased perceptions of threat imminence.

Fig. 4
figure 4

Illustration of changes in value and types of uncertainty with anxiety, avoidance, and successful exposure therapy, and effects on changes in value and uncertainty on avoidance. Example scenario, from the social anxiety MDP in Fig. 2, illustrates a choice between an avoidance behavior, leaving the group to stand by the snack table, and a nonavoidance behavior, making a joke during the conversation. In normative behavior, the nonavoidance behavior has higher value but also greater irreducible uncertainty (e.g., there is a greater risk that making a joke will go poorly and lead to negative outcomes compared to leaving to get a snack). With high anxiety, before developing avoidance behaviors, uncertainty is miscalculated as greater estimation and/or unexpected uncertainty rather than irreducible uncertainty, leading to greater tendency to avoid the more uncertain stimulus and choose the safe avoidance behavior. Engaging in avoidance behaviors increases the value of the avoidance vs. nonavoidance behavior (due the temporal difference learning processes illustrated in Fig. 2E and F). Additionally, greater experience with outcomes stemming from the avoidance behavior reduces uncertainty associated with the chosen behavior while the uncertainty associated with the unchosen, nonavoidance behavior is not reduced. Both of these processes increase the tendency to avoid. Initial exposure sessions begin to correct uncertainty associated with the nonavoidance behavior through experience with that choice’s outcomes. Avoidance is decreased relative to pre-exposure behavior but is still higher than normative behavior. After repeated exposures, the relative values of avoidance vs. nonavoidance behavior normalize along with further corrections in uncertainty calculations. As a result, the tendency to avoid becomes similar to normative behavior

Conceptualizing exposure as reducing avoidance and uncertainty in this framework can account for empirical findings inconsistent with some previous theories, notably the different effects of between and within session fear reduction on treatment outcomes. In exposure therapy, each individual exposure exercise would serve as a single episode of an MDP. In MDPs, values and uncertainties are generally only updated once the episode is over; therefore, uncertainty estimates will not be updated until the patient completes the exposure and fails to encounter the feared outcome. After completing the exposure, maladaptive uncertainty estimates are updated for the next episode. Therefore, though some habituation to uncertainty-related fear and discomfort may occur during each exposure session, particularly if a single session contains repeated exposure experiences, the primary driver of response is the between-session update in uncertainty values. This update will manifest primarily in changes in between-session differences in reported fear, consistent with clinical findings.

Reductions in perceived uncertainty could affect other aspects of avoidance. By reducing threat imminence, lower uncertainty can diminish passive avoidance behavior and reliance on inflexible options. Exposure also may directly target the use of options and other forms of cached sequences. Instead of engaging in stereotyped sequences of avoidance behaviors (e.g., always using a remote entrance in a mall and taking a longer path to the intended store to avoid crowds), exposures encourage alternate behaviors incompatible with these sequences (e.g., taking the most direct path to the store) and induce more flexible behavior.

One open question is whether exposure therapy reduces learned uncertainty estimates or uncertainty learning itself. It is possible that, given multiple exposure sessions, patients become more accurate at updating values based on irreducible and unexpected uncertainty and show an improvement in underlying learning mechanisms. However, learning itself may not change during exposure, especially if exposure is not extensively practiced. Corrections in uncertainty estimates, but not vulnerabilities in uncertainty learning itself, may explain relapse after response to exposure therapy (Vervliet et al., 2013) and serve as a treatment target for novel therapeutic approaches.

Practical advantages of incorporating uncertainty into anxiety treatment

Reconceptualizing clinical anxiety in our proposed framework resolves many of the issues with exposure therapy and other forms of anxiety treatment. This framework addresses components of exposure that are not explained well by current theories of exposure therapy and explains how nonbehavioral treatments can also remediate maladaptive uncertainty learning. It also facilitates applications of basic research into learning processes and synergistic somatic treatments to exposure therapy.

First of all, remediating maladaptive uncertainty learning explains how components like avoidance and the need to target core fears relate to exposure therapy. This framework places avoidance as a central target of exposure: avoidance results from miscalculations of uncertainty and is reduced through exploration during exposure. Maladaptive avoidance, therefore, represents the extent of uncertainty miscalculations and decreases as uncertainty calculations are remediated. The importance of ensuring a focus on a patient’s core fear also can be explained within this framework: a core fear is represented by a specific terminal state in a specific MDP. Targeting portions of a MDP that are related to the core fear will lead to changes in uncertainty in that part of the MDP, rather than in other parts of the MDP (or other MDPs altogether) that do not affect that core fear.

Second, changes in uncertainty can also be accounted for by non-behavioral approaches. Although they target maladaptive uncertainty learning in different ways, shared effects on uncertainty may explain similar processes and outcomes across treatment approaches (Arch & Craske, 2008; Carpenter et al., 2018). Cognitive therapy techniques such as cognitive challenges and behavioral experiments also test relationships among components of an MDP to remediate maladaptive uncertainty calculations. “Downward arrow,” where patients make concrete connections between each layer of fear until reaching their core fear, and similar cognitive techniques make relationships between sequences of actions and states explicit, breaking up options. Third-wave mindfulness approaches target the distress resulting from increased uncertainty by embracing the inability to predict and control outcomes; these and other distress tolerance-focused approaches may reduce the effect of uncertainty on explore-exploit decisions and so reduce the effect of maladaptive uncertainty calculations on avoidance.

Next steps for understanding uncertainty dysfunctions in anxiety and treatment mechanisms

Conceptualizing disrupted uncertainty learning as the basis for avoidance and other impairments in anxiety provides several avenues to test this hypothesis and its predictions. First, although anxiety is related to greater physiological responses to uncertain threat in Pavlovian paradigms, the hypothesized relationship between impaired uncertainty learning, miscalculated uncertainty estimates, and avoidance needs to be tested using instrumental learning paradigms. These learning paradigms should use computational process models to derive precise measures of uncertainty and behavioral avoidance. Follow-up studies should investigate specifics about this relationship. Specifically, it is currently unclear whether uncertainty learning impairments result in avoidance only for fear-based disorders, versus for internalizing disorders more generally, and whether using disorder-specific stimuli (e.g., angry faces for people with social anxiety) has different effects than using generally aversive stimuli (e.g., losing points or receiving an electric shock).

Another area to test is whether differences in neural responses to threat in anxiety are due to impaired uncertainty processing. Overlapping brain regions and networks are implicated both in anxiety and uncertainty processing, particularly the salience and frontoparietal networks and regions including dorsal anterior cingulate cortex, vmPFC, insula, amygdala, BNST, and PAG. Neuromodulators, such as norepinephrine, serotonin, and dopamine, also are hypothesized to play roles in uncertainty processing and anxiety.

Clinically, alterations in uncertainty, and reductions in these alterations with successful exposure therapy, should correspond with several observations in anxiety. If anxious avoidance is due to altered uncertainty learning, greater uncertainty learning dysfunctions should increase with greater anxious avoidance as well as more severe anxiety symptoms overall. As a readout of impaired uncertainty estimates, changes in avoidance should predict and correlate with overall symptom improvement and treatment success. Additionally, if changes in maladaptive uncertainty learning with treatment reduce disruptions in uncertainty estimates, the extent of changes in uncertainty learning with treatment should predict reduced risk of relapse after treatment. One issue with these proposed relationships is the difficulty of accurately measuring behavioral avoidance; future work should test if ecological momentary assessment, passive sensing, or other novel approaches can provide valid measurement of avoidance behavior (Craske & Tsao, 1999; Rashid et al., 2020).

A full theory of maladaptive aversive learning and avoidance in anxiety does not just allow a better understanding of current approaches; this theoretical framework can be used to test augmentations and novel treatments and to tailor treatment components to individual patients. Basic human and non-human neurobehavioral learning research can be used to 1) test components of normative learning and decision-making to better understand how normative learning occurs in MDPs with uncertain, aversive outcomes, and 2) given a specified learning difference in clinical anxiety, to test how this difference can be remediated. Interventions incorporating insights from basic research on potential learning targets would then be tested in people with clinical anxiety. Possible treatments could affect several targets: they could remediate disrupted uncertainty learning, correct miscalculated uncertainty, or encourage exploration of uncertain states in negative environments. Such treatments could include new psychotherapy techniques, including those that draw on basic learning research, as well as somatic and pharmacological approaches targeting these processes.

Maladaptive aversive uncertainty learning and avoidance also should be connected to other impairments in anxiety. Fear-based disorders show hyperarousal and attention bias to threat. These impairments could be due to altered calculations of uncertainty that cause the world to be less well predicted and, by moving along the threat imminence spectrum, appear more threatening. Normative studies show that uncertainty during learning tasks affects attention (Stojić et al., 2020; Walker et al., 2019; but see Wise et al., 2019), but this has not been studied in anxiety. The use of psychometrically valid measurements will be particularly important for assessing this relationship (Price et al., 2019; Rodebaugh et al., 2016; Woody et al., 2017). Understanding these relationships will lead to a fuller explanation of impairments in anxiety and how they can be jointly targeted.

This theoretical framework also enables links between dysfunctional uncertainty learning and risk factors for anxiety to be studied. Understanding how early life stress, perceptions of control, anxiety sensitivity, and other risk factors for anxiety (Zinbarg et al., 2022) potentiate, maintain, and are caused by disrupted uncertainty learning can show how these risk factors can be modified or buffered. Longitudinal studies will be especially important to understand the relationships between these risk factors and uncertainty learning and avoidance over time (Struijs et al., 2018; Wise et al., 2022).

These research directions illustrate the limits of our proposed framework: many components require further testing before this framework can influence clinical practice. In addition, we make assumptions based on the current state of the literature (e.g., the distinction between fear- and distress-based disorders), which will require updates as our knowledge in these areas evolve. We also do not account for aspects of anxiety symptoms and exposure therapy processes that are more transdiagnostic, such as the role that increasing distress tolerance has on the ability to stay in anxiety-provoking situations. Additionally, the function of exposure therapy in non-fear based disorders, like generalized anxiety disorder or in eating disorders, is not explained by this framework.

More generally, our proposed framework exemplifies the benefits of drawing from both clinical knowledge and neurocomputational findings in computational psychiatry research. By integrating clinical knowledge from trial-and-error and verbal theories via clinical experience, lived experience, case reports, and clinical trials with basic knowledge from theoretical work and studies of basic behavior and neural function in humans and other organisms, we can identify gaps in knowledge and work to fill them, with the goal of improving treatment and prevention of psychiatric illness.

Conclusions

Exposure therapy and other treatments for anxiety show moderate success, but exposure therapy is based on theoretical assumptions (i.e., that pathological anxiety results from excessive learned threat) inconsistent with the empirical literature. Instead, uncertainty-related learning disruptions predominate in anxiety. Computational process models, particularly reinforcement learning and other error-driven learning models, precisely define different types of uncertainty and their effects on behavior. Research using these models suggests that anxiety is related to difficulty discerning irreducible uncertainty (uncertainty from noisy outcomes) from unexpected uncertainty (uncertainty from changes in the relationships between stimuli and outcomes). In instrumental learning, uncertainty affects learning and decision-making through effects on decisions to explore versus exploit. Uncertainty-sensitive, temporal difference, Markov decision processes can explain normative avoidance behavior, the relationship between disrupted uncertainty learning and avoidance in anxiety, and how treatments, such as exposure therapy, reduce avoidance through remediating uncertainty calculations. In turn, this framework provides avenues for further research to better understand anxiety and its treatment.