Introduction

Drug addiction has been characterized in terms of the brain’s learning and memory systems. This view posits a gradual shift from initial voluntary drug use to an increasing loss of control over drug intake, which becomes habitual or even compulsive drug addiction [1,2,3,4]. Drug use starts out as a goal-directed behavior, mediated by the reinforcing and hedonic effects of the drug, but habitual processes eventually take over, hampering attempts to stop drug intake, in spite of severely aversive consequences and conscious decisions to reduce consumption or to remain abstinent. This transition is considered to depend upon interactions between Pavlovian and instrumental learning processes [1, 3]. Habitual or even compulsive instrumental drug-taking behaviors are thought to be triggered by internal and external drug-associated cues, acute stress events or a priming drug dose [5], as well as by internal mood states [6]. In chronic drug users, conditioned drug cues may gain incentive salience through Pavlovian mechanisms, whereas alternative reinforcers lose relevance [7, 8]. Addictive behavior is also characterized by negative reinforcement during withdrawal distress and early abstinence, which is defined as drug taking that alleviates a distress-associated aversive emotional state [9]. In the current review, we focus on the empirical evidence regarding these processes and the hypothetically underlying learning mechanisms in the development and maintenance of alcohol use disorder (AUD). For each process, we first briefly report the available behavioral paradigms and underlying neural structures, and then summarize the studies assessing participants with AUD and at-risk groups.

Instrumental Learning: Habitual Versus Goal-Directed Behavior

Behavioral Paradigms and Neural Circuitry

Instrumental learning can be controlled by both goal-directed and habitual processes [10, 11]. Under habitual control, action selection is driven by rather rigid stimulus–response (S-R) associations and led by past reinforcement [12]. Under the more computationally costly goal-directed control, the subject uses their knowledge about the response–outcome (R-O) contingency and the current incentive value of the outcome to guide behavior [10, 13, 14]. Goal-directed and habitual behaviors differ therefore in their sensitivity to changes in both the causal nature of the instrumental R-O relationship and the current value of the outcome [11, 15], with insensitivity to such changes considered a hallmark of habitual behavior [16].

Classically, sensitivity to changes in outcome value can be assessed through outcome devaluation. Briefly, these tasks consist of an instrumental learning stage, in which an action is paired with a desired outcome, followed by an outcome devaluation phase, e.g., through sensory-specific satiation, aversive conditioning or instruction, and a test done in extinction. On the other hand, sensitivity to changes in the causal R-O relationship can be assessed with contingency degradation tasks, in which the probabilities of receiving action-contingent outcomes and non-contingent outcomes are manipulated, so that the causal R-O relationship is degraded by increasing the latter. When both probabilities are equal, performing the action has no effect on the likelihood of the outcome, so that the net R-O contingency, and thus the causal status of the action, is zero [10, 17]. It has been extensively demonstrated that both animals and humans are sensitive to changes in outcome value, as reflected by decreased responding to devalued outcomes [10, 18,19,20,21], as well as to contingency degradation, with decreased responding to smaller R-O contingencies and, in humans, explicit judgments about the causal relationship between action and outcome that closely approximate instrumental behavior [10, 17, 22,23,24,25]. Indeed, outcome devaluation procedures are currently considered the strongest test of habitual behavior [26].

More recently, computational reinforcement learning theories have formalized habitual and goal-directed processes in terms of model-free (habitual) and model-based (goal-directed) control [13, 27]. This framework uses sequential Markov decision tasks, such as the so-called two-step task [28]. In this task, participants must perform two consecutive choices to obtain a reward: first-stage stimuli lead to different second-stage states with fixed probabilities, and second-stage stimuli are associated with slowly changing reward probabilities. While a model-free agent will repeat previously rewarded actions, a model-based agent will take both previous reward and the task’s transition structure into account. In the computational model, first-stage actions are computed according to both model-free temporal-difference learning and model-based reinforcement learning algorithms, which are typically weighted with a free parameter ω, with ω = 1 indicating pure model-based and ω = 0 pure model-free control. Performance of clinically healthy humans in the two-step task is consistent with a mixture of model-free and model-based behavior [28,29,30,31,32]. This task has been recently back-translated to animal research, showing a similar behavioral pattern in rodents [33,34,35]. Moreover, a recent study has shown that rodents initially only use outcomes to drive behavior, but recover the structure of the environment over the course of learning and also use it to make decisions [36]. Indeed, some studies suggest that both rodents and humans can display predominantly model-based behavior following overtraining in the two-step task [37, 38].

In animals, there is solid evidence pertaining the involvement of the orbitofrontal cortex (OFC), prelimbic prefrontal cortex (PFC), and dorsomedial striatum (DMS) in goal-directed behavior, and the infralimbic PFC and dorsolateral striatum (DLS) in habitual behavior. Lesioning the OFC has been shown to reduce [39, 40] and stimulating to enhance [41] goal-directed behavior in outcome devaluation tasks. OFC–amygdala [42] and OFC–striatum [43] disconnection has been shown to decrease sensitivity to outcome devaluation. Similarly, reversible inactivation of the OFC also impairs model-based choices in two-step tasks [37]. Prelimbic lesions have been shown to impair sensitivity to both outcome devaluation [10, 44,45,46] and contingency degradation [10, 45, 47] in rodents. In contrast, infralimbic lesions result in marked sensitivity to outcome value [46]. Similarly, DMS lesions abolish sensitivity to outcome devaluation and contingency degradation [48], whereas DLS lesions increase sensitivity to outcome value [21, 49]. Rodent research has also pointed to a role of basolateral amygdala [40], dorsal hippocampus [37], and anterior cingulate cortex (ACC) [33] in goal-directed/model-based control. Although ventromedial PFC (vmPFC) and subgenual ACC have been suggested as homologous to the rodent prelimbic [11] and infralimbic PFC [50], respectively, and anterior caudate and posterior putamen as homologous to the rodent DMS and DLS, respectively [11], the clear dissociations observed in rodents are yet to be replicated in humans.

In humans, medial OFC (mOFC) and vmPFC have been highlighted as key regions driving goal-directed behavior in outcome devaluation tasks [19, 51, 52], with white matter tract integrity between caudate and vmPFC predicting goal-directed behavior, as reflected by increased sensitivity to instructed devaluation [52]. In contrast, devaluation insensitivity has been associated with increased subgenual ACC and ventral striatal (VS) blood-oxygen-level-dependent (BOLD) responses during S-R compared with R-O learning [50]. However, although vmPFC has been suggested to encode the probability of action-contingent outcomes [17, 53], recent studies have found vmPFC lesioned patients to be insensitive to outcome devaluation but not to contingency degradation [54, 55]. With the two-step task, model-free and model-based valuations consistently display overlapping neural signatures implicating both vmPFC and VS [28, 30, 31]. Some studies have implicated brain structures beyond medial prefrontal and striatal regions, suggesting that inferior frontal gyrus, dorsolateral PFC, hippocampus, and inferior parietal lobule might also be crucial for goal-directed/model-based control [17, 50, 56,57,58,59].

Findings in Alcohol Use Disorder and At-Risk Populations

In a seminal study, Dickinson et al. [60] posited alcohol seeking as an S-R habit, by demonstrating that, in contrast to lever pressing for food pellets, lever pressing to ethanol was insensitive to devaluation in rodents. Moreover, after an extended period of self-administration, alcohol consumption will continue even if mixed with aversive quinine concentrations [61, 62]. Subsequent studies have shown that chronic intermittent ethanol exposure results in habitual alcohol seeking in rodents, as assessed with both outcome devaluation [63, 64] and contingency degradation [65] procedures. Yet, alcohol further affects rodent goal-directed behavior, with acute intoxication [66], chronic ethanol exposure [67] as well as contextual conditioning to alcohol [68] decreasing sensitivity to devaluation of non-alcoholic outcomes. Interestingly, the expression of alcohol-related habits appears to be sex and age dependent. Research has shown that chromosomal male, but not female, rodents become insensitive to alcohol devaluation [69] and that male adult, but not adolescent, rodents become insensitive to contingency degradation with alcohol outcomes after overtraining [70]. A recent study showed an intriguing interaction between sex and age at alcohol exposure. Barker et al. [71] demonstrated that exposure to alcohol during adulthood, but not during adolescence, impaired adult male rats’ sensitivity to the value of sucrose solution in action-promoting reinforcement schedules. In contrast, this impairment was observed in female rats only in those exposed during adolescence, whereas those that were exposed in adulthood were sensitive to outcome devaluation in both action- and habit-promoting schedules. Dovetailing with the lesion results described above, operant responding for alcohol in rodents is initially goal-directed and driven by the DMS, which exhibits increased firing following alcohol reinforcement [21, 72]. In contrast, the DLS shows phasic activity time-locked to lever presses for alcohol self-administration [72], and overtraining results in a shift to DLS control and insensitivity to outcome devaluation [21]. Further studies have demonstrated how habitual alcohol seeking depends on glutamatergic inputs to the DLS and D2 receptors within the DLS, with infusion of a D2 receptor antagonist restoring sensitivity to devaluation of alcohol [73], and how chronic ethanol exposure induces long-lasting changes in OFC excitability and OFC–DMS transmission that contribute to the loss of goal-directed control [67]. Indeed, treatments that decrease DLS function and/or output and that increase OFC activity have been reported to restore goal-directed behaviors [21, 67, 73].

In humans, studies investigating the relationship between AUD and the balance between goal-directed versus habitual behavior are limited (Table 1). Sjoerds et al. [74•] used an instructed outcome devaluation task [51] to assess the behavior of recently detoxified AUD participants. Although patients with AUD and healthy controls did not differ in their instrumental learning performance, AUD participants already displayed increased posterior putamen and decreased vmPFC activity during this phase. In the outcome-devaluation test, Sjoerds et al. [74•] report impaired R-O knowledge as reflected by choice behavior in AUD, with decreased activity in both vmPFC and anterior putamen, regions implicated in goal-directed control, and increased activity in the posterior putamen, an area critical for habit learning [11]. Moreover, the authors modified the task to include alcohol-related pictures in addition to fruit images, but observed no differences between stimulus types, suggesting a shift toward habitual behavior in AUD that is not specific to addiction-relevant stimuli. In contrast, in a recent study, van Timmeren et al. [75] report no decreased devaluation sensitivity as marker of goal-directed control in recently detoxified AUD patients compared with healthy controls using an aversion-induced outcome devaluation task [76].

Table 1 Selected studies investigating habitual, Pavlovian and PIT processes in AUD and at-risk populations

Other studies have employed the two-step task devised by Daw et al. [28]. In a group of recently detoxified abstinent AUD participants, Sebold et al. [77] reported that, although both groups displayed a mixture of model-free and model-based choice behavior, AUD patients exhibited less model-based control than healthy controls following non-rewards, but did not differ following rewards. However, this finding was not replicated in a subsequent study, in which AUD participants were divided into those who abstained and those who relapsed to alcohol at a follow-up assessment [78••]. Neither model-based choice behavior nor the computational parameter ω predicted group membership, but if poorer model-based control was associated with higher alcohol expectancies, patients had a higher relapse risk at follow-up. At the neural level, participants who relapsed to alcohol showed blunted mPFC activity associated with model-based control, whereas the authors found no differences between the groups for model-free learning signals. This study not only underlined the association between mPFC and goal-directed deficits in AUD but further suggests that decreased model-based control might predict poor treatment outcome only in combination with high alcohol expectancies. Similarly, Voon et al. [79] found no differences in ω between long-term abstinent AUD participants and healthy controls. However, they also observed that more prolonged abstinence was associated with greater ω values, indicating more model-based control and suggesting that goal-directed behaviors might improve with abstinence. Together, the studies of Sebold et al. [78••] and Voon et al. [79] indicate that model-based control may be relevant for abstinence, both as an indicator of prospective treatment outcome and for retrospective abstinence duration.

Additional studies have also investigated the relationship between alcohol use and the balance between model-free and model-based control in samples with no known diagnosis of AUD, obtaining mixed results. Greater alcohol consumption and having had binge drinking episodes were not found to be associated with model-free/model-based control in 18-year-olds [31], dovetailing with preclinical results showing that adolescent rats did not express alcohol-related habits [70]. However, severe binge drinkers, who had had at least one binging episode per week for the previous 6 months, have been reported to display reduced model-based control compared with healthy controls [80]. Two large online studies have explored the association between problematic alcohol use, assessed with the Alcohol Use Disorder Identification Test (AUDIT [81, 82]), and model-based/model-free behavior. Gillan et al. [83] reported how larger AUDIT scores were associated with decreased model-based choice behavior, as well as lower βMB scores (recent reformulations of the computational model used to analyze the two-step task no longer use the weighting parameter ω as a measure of the relative balance between model-based and model-free behavior, but rather separate inverse temperature parameters βMB and βMF, which are algebraically equivalent to the original formulation under the substitution βMB = ωβ and βMF = (1 − ω)β [see, e.g., 83, 84]). They further observed that this association was steeper among “putative patients” (defined as those scoring in the top 25% on the AUDIT). Dovetailing with previous studies, model-free measures were not related to questionnaire scores. Using a deterministic two-step variant [85], however, Patzelt et al. [86] found no association between ω and AUDIT scores.

Overall, both preclinical and clinical studies have demonstrated AUD-related changes in the fronto-striatal networks that support goal-directed and habitual behavior. Studies with AUD patients suggest deficits in goal-directed or model-based action control that are, however, heavily influenced by factors such as abstinence and alcohol expectancies.

Pavlovian Learning Mechanisms

Behavioral Paradigms and Neural Circuitry

During Pavlovian conditioning, initially neutral stimuli become conditioned stimuli (CS) through repeated pairing with an unconditioned stimulus (US) [87]. As a consequence, the CS elicits a variety of conditioned responses (CRs) originally provoked by the US. Repeatedly presenting these cues in extinction, i.e., without the US, weakens CRs by establishing a new, inhibitory CS–noUS association that henceforth competes for behavioral expression with the original association [88]. This duality explains several Pavlovian relapse phenomena, where CRs recover under certain conditions [89]. In animals, the concepts of conditioned approach [90,91,92] and place preference [93,94,95] have been studied extensively in the context of addiction. In place conditioning protocols, distinct chambers (CS) are paired with the administration of either a US or noUS. In the test phase, animals are given free access to both chambers; those that develop conditioned place preference will spend more time in the US-paired context. Rewarding outcomes, including sex [96], food [97,98,99,100], fluids [100], and numerous drugs of abuse [for a review, see 95], have been shown to reliably induce conditioned place preference, which can be reinstated after extinction [95, 101], but can be abolished by devaluing the US [100, 102]. In Pavlovian lever autoshaping, the insertion and retraction of a lever (CS) signals food (US) delivery, irrespective of behavior. Some rats (sign-trackers) predominantly approach and interact with the Pavlovian cue, i.e., the lever, whereas others (goal-trackers) consistently approach the location of food delivery [103,104,105,106]. For sign-tracking rats, the CS appears to be attributed with incentive salience [107,108,109] and can even effectively reinforce new instrumental learning [107]. Following US devaluation, goal-trackers decrease both cue- and outcome-directed behaviors, whereas sign-trackers continue responding to cues [110,111,112]. Interestingly, in contrast to instrumental devaluation studies [46, 113], sign-tracking rats become sensitive to outcome devaluation with extended autoshaping training [110, 114, 115].

Most Pavlovian procedures used in humans were initially developed in animals. However, research attempting to translate the two paradigms and concepts described above is limited. Human studies often employ differential conditioning protocols in which one stimulus (CS+) is paired with the US while a second (CS−) is not. Both appetitive and aversive CRs have been quantified on various response systems, including subjective ratings, psychophysiological measures, and neuroimaging [116,117,118,119,120]. Recent research has found evidence of behaviors similar to sign- and goal-tracking in healthy participants. Using eye-tracking, Garofalo and di Pellegrino [121] and Schad et al. [122••] demonstrated that, during Pavlovian conditioning, some participants gazed more often toward the reward-predicting cues and others toward the location where money would be delivered, consistent with sign- and goal-tracking behavior, respectively. Although Pavlovian conditioning was equally successful in both groups [121], cues influenced instrumental behavior more strongly in sign-tracking individuals [121, 122••], consistent with animal findings [107]. Using a computational model, Schad et al. [122••] determined that goal-trackers relied strongly on model-based state prediction errors, whereas sign-trackers exhibited a neural reward prediction error signal. Attempts to closely adapt Pavlovian lever autoshaping paradigms for humans are also under way [e.g., 123; for a review, see 124]. The conditioned place preference model has also been translated to humans, frequently using virtual reality or computer avatars. These studies have demonstrated that participants show both implicit and explicit preference for rooms previously paired with primary [125, 126] and secondary reinforcers [127,128,129], as well as drugs of abuse [130, 131].

Preclinical work suggests largely overlapping neural circuits are involved in Pavlovian learning processes, including OFC [132], dorsolateral PFC [97], nucleus accumbens (NAcc) [133], subthalamic nucleus [134], amygdala [135,136,137,138,139,140], hippocampus [139], and insula [138], which are widely preserved across species [141, 142]. Recent work has shown the relevance of adrenergic, cannabinoid receptor, and NMDA signaling for Pavlovian conditioning [99, 143,144,145,146]. Still, the seminal work of Schultz et al. [147, 148] demonstrated the critical role of dopamine by showing a shift in dopaminergic firing from the US to the CS over the course of conditioning. Subsequent work has shown that phasic dopamine release in the NAcc matches reward prediction error signals in sign-tracking rats, whereas goal-tracking rats do not show a decline in US-evoked dopaminergic release despite exhibiting conditioned approach [149]. Dopamine has thus been posited a role in incentive salience attribution, not S-R learning itself [150]. A recent computational model has accounted for this individual variation in Pavlovian conditioned approach behavior and dopaminergic release patterns [151] and its predictions were supported experimentally [152]. This model accounts for the development of distinct CRs in rodents through a combination of a model-based and a featured-model-free system, a revised model-free system that uses factored representations [151, 153, 154].

Human neuroimaging studies have repeatedly identified activity within dopaminergic midbrain, VS (including NAcc), OFC, dorsal ACC, and amygdala during appetitive Pavlovian conditioning [120, 155,156,157,158,159,160,161,162; for a meta-analysis, see 163]. In line with preclinical evidence showing that the infralimbic PFC promotes extinction recall, regulating Pavlovian relapse phenomena [141, 164], the vmPFC is considered particularly relevant for the recall of extinction memory, in concert with striatum and amygdala [165,166,167,168], and the inhibition of appetitive responses [117, 169]. Research further suggests that Pavlovian value signals are encoded within vmPFC and OFC [158, 170, 171], as well as the VS [156, 160, 172]. Indeed, VS activity has been found to shift from US to CS over the course of learning [172] and to be consistent with reward prediction error signals [162, 172]. A recent study, however, reported NAcc model-free reward prediction error activity only in those individuals classified as sign-trackers [122••], dovetailing with prior animal findings [149]. A few studies have further suggested that the dorsal striatum might not only be involved in instrumental [11] but also in Pavlovian conditioning [158, 160], supporting model-based inference, i.e., representing a cognitive map of Pavlovian contingencies even in the absence of action, as reflected by participants’ explicit contingency knowledge [160]. In a similar vein, the amygdala has been found to be engaged in model-based inference during Pavlovian conditioning [173], although prediction error signals in this structure have also been reported to be stronger in sign- than goal-tracking participants [122••].

Findings in Alcohol Use Disorder and At-Risk Populations

Pavlovian lever autoshaping procedures have been reported to induce high volumes of ethanol drinking in rodents and have been suggested as an animal learning model of AUD [91]. Using an alcoholic US not only induces sign-tracking behavior in rats [174,175,176] but shifts conditioned approach behavior from goal- to sign-tracking over the course of training [177, 178], demonstrating how Pavlovian alcohol cues become powerful incentive stimuli. Moreover, sign-tracking has also been shown to enhance operant responding for alcohol [174, 179]. Importantly, exposure to alcohol during adolescence has been reported to blunt goal-tracking behavior [180] and increase sign-tracking behavior in adulthood [181, 182]. Indeed, exposure to alcohol induces changes in the dopaminergic system, with both increased phasic dopamine signaling to the CS [178] and to positive prediction errors [182]. Regarding ethanol-induced place conditioning, mice show robust conditioned place preference [183,184,185] (but results with rats are mixed [for a review, see 95]) that can be facilitated by stress [186,187,188] and is also prone to Pavlovian relapse effects like reinstatement [101, 189]. Striatal dopamine [190], cannabinoid-1 receptor [190, 191], and noradrenergic signaling [192] have been attributed a role in the acquisition of ethanol-induced conditioned place preference. Both NAcc and amygdala have been shown to be relevant for the acquisition and expression of ethanol-induced conditioned place preference [185, 193], with antagonism of NAcc NMDA receptors blocking conditioned place preference expression [185] and NMDA-receptor partial agonist d-cycloserine interfering with reconditioning, but having no effect on extinction [194]. Several studies have demonstrated that naloxone facilitates the extinction of ethanol-induced conditioned place preference [189, 195, 196], even to the point of generating a weak conditioned place aversion [196], suggesting that opioid receptor activation also mediates the motivational effects of alcohol. Indeed, injections of a delta-opioid receptor antagonist in the central amygdala can reverse ethanol-induced conditioned place preference [197]. In a compelling design, Cunningham and Patel [198] used a modified place conditioning paradigm to assess conditioned approach by introducing a visual cue in the conditioning chamber. Here, mice showed a strong preference for the location of the visual cue that had been associated with intraperitoneal ethanol injections, a behavior consistent with sign-tracking.

Although translation of the animal concepts and methodologies described above is limited, human studies have addressed related constructs, such as cue reactivity [90, 124] (Table 1). The presentation of alcohol-associated cues, e.g., the sight or smell of an alcoholic beverage, has been shown to bias approach tendencies [199,200,201] and attention [202,203,204], with increased attentional bias predicting relapse risk [205, 206], and induce conscious craving [204, 207,208,209,210,211,212] in individuals diagnosed with AUD. Passive viewing tasks, in which participants are shown alcohol cues (images of alcoholic beverages), affectively neutral images and abstract images, have been widely used to research cue reactivity in AUD patients. These evoke a number of psychophysiological responses, including increased salivation [213], changes in heart rate variability [203, 206, 209, 214], and larger pupillary dilation [215], some of which have also been associated with higher relapse probability at follow-up [206, 213, 215]. Moreover, in AUD patients, alcohol cues elicit activity within limbic and prefrontal structures involved in incentive salience attribution and reward processing, including mPFC, OFC, ACC, posterior cingulate cortex, and striatum [212, 216,217,218,219,220,221,222]. Interestingly, AUD duration has been shown to correlate with activation of the posterior putamen [212], an area related to habitual control [11]. Moreover, increased frontal activation has been related to decreased dopamine receptor availability in the VS [217], and these cue-elicited fronto-striatal responses have been shown to predict subsequent craving and relapse [218,219,220]. Specifically, alcohol-cue reactivity in the VS has been suggested as a prognostic factor for relapse in AUD patients [219].

Dovetailing with results in clinical samples, cue reactivity studies with social drinkers have also reported increases in attentional bias [223, 224], heart rate variability [225], and craving [226,227,228] in response to alcohol cues in heavy drinkers. Interestingly, Roy-Charland et al. [223] observed that participants who consumed more alcohol performed more frequent saccades into and out of alcohol-related image parts, a behavior reminiscent of sign-tracking. Neuroimaging studies have further highlighted the relevance of fronto-striatal circuits, showing how heavy and light alcohol use modulate PFC, ACC, and ventral and dorsal striatal responses to alcohol-related cues [229,230,231]. While cue reactivity paradigms make use of “naturally”, idiosyncratically conditioned cues, a handful of studies have investigated de novo alcohol conditioning in social drinkers so far [232•, 233,234,235,236]. Specifically, neutral cues experimentally paired with low to moderate doses of alcohol have been associated with increased skin conductance [233, 236] and greater attentional capture [234, 236] compared with a CS−, with attentional bias being positively related to participants’ self-reported liking of alcohol [234]. In a neuroimaging study, visual background stimuli associated with intravenous alcohol (CS+) compared with saline infusion (CS−) were found to evoke BOLD responses in frontoparietal and orbitofrontal regions, ACC, and insula [235]. However, CS+-elicited BOLD responses were unrelated to recent drinking or other risk factors of AUD, such as a family history of the disorder, and no behavioral conditioning effect could be observed in a reaction time task [235]. A single study has researched alcohol-induced place conditioning in humans [232•]. In a multi-session set-up, heavy social drinkers received either alcoholic or non-alcoholic drinks in two distinct rooms, respectively. At test, participants preferred the room previously associated with alcohol consumption compared with the non-alcohol-associated room, i.e., displayed a behavior consistent with conditioned place preference. This effect, however, was independent of explicit awareness of context contingency, suggesting that alcohol cues influence behavior irrespective of drug awareness [232•]. Of note, alcohol conditioning had no effect on subsequent free choice behavior [234, 236], raising questions about when and how the presence of alcohol-associated cues becomes behaviorally relevant.

In summary, this line of research provides evidence that both AUD patients and social drinkers attribute incentive salience to alcohol cues and that conditioned incentive properties may develop largely outside of the participant’s awareness. While alcohol-paired cues consistently increased neural and psychophysiological responses in individuals diagnosed with AUD, only few studies have addressed the process of de novo alcohol conditioning in humans.

Influence of Pavlovian Cues on Instrumental Responding: Pavlovian-to-Instrumental Transfer

Behavioral Paradigms and Neural Circuitry

Drug-related cues do not only have high impact on psychological and neurophysiological reactions but can also directly influence the motivation to perform certain behaviors. The Pavlovian-to-instrumental transfer (PIT) test has been used to assess the impact of Pavlovian cues on instrumental behavior. Here, positively valued Pavlovian cues enhance instrumental approach behavior [for a review, see 237], while negatively valued Pavlovian cues attenuate instrumental approach behavior [e.g., 238]. Numerous animal studies have assessed PIT [239], typically using a three-stage experimental design [237]: in a first Pavlovian conditioning stage, the animal is presented with a neutral stimulus that is paired with a positive reinforcer, becoming an appetitive CS+; in a second instrumental training stage, the animal learns via trial and error to press a lever to receive a desired outcome; and in a final transfer stage, the animal is confronted with the lever (in extinction) and either the CS+ or no cue. The typical PIT effect observed during the transfer stage is an increase in lever presses in trials with the CS+ compared with trials with no cue. So-called full PIT paradigms use several US types to distinguish between general and outcome-specific PIT effects. In general PIT, Pavlovian cues impact instrumental performance irrespective of the associated reward, e.g., ethanol-paired Pavlovian cues can have a general excitatory effect on reward-seeking behavior in rats, affecting both ethanol-associated and sucrose-associated lever pressing [240]. In contrast, in outcome-specific PIT, the impact of Pavlovian cues on instrumental performance is directly linked to the associated reward, e.g., sucrose-associated Pavlovian cues selectively elevate sucrose-directed but not ethanol-directed lever pressing [240]. Several theories have attempted to explain the transfer effect [for a review, see 237]. Initial theories posited that the CS+ elicits a general increase in motivational arousal and activates the memory of the sensory-specific features of the outcome [241,242,243]. More recent theories include the associative-cybernetic model, which posits an S-O, O-R chain through associative and S-R memories as well as a general enhancement of instrumental actions [244], and hierarchical models, which postulate that the CS enhances instrumental responding because of its predictive value through hierarchical CS-(R-O) associations [245,246,247].

The PIT paradigm has also been used in humans, with evidence for both general and outcome-specific PIT effects [237, 248,249,250]. Huys et al. [251] showed distinct effects of appetitive Pavlovian stimuli enhancing approach and inhibiting withdrawal behavior, while aversive stimuli showed the opposite result. These effects were independent of reinforcer presentation delay, which was interpreted as a disruption in goal-directed instrumental control by Pavlovian cues [252]. PIT effects in humans have been shown to be insensitive to outcome-devaluation [253] (although see satiety effects in PIT-related NAcc activation in animals [254]), but sensitive to extinction, although this was less effective for reducing PIT in a different context [255]. Moreover, reduced working memory capacity has been reported to attenuate outcome-specific but not general PIT [256].

On a neural level, animal studies have shown that dopaminergic neurotransmission in subcortical areas, specifically within the NAcc, is crucial for general PIT [257,258,259]. However, satiety attenuated PIT-related NAcc dopaminergic responses [254], emphasizing the role of this region on cue-motivated behavior [for a review, see 260]. Recent work has also highlighted the role of striatal cholinergic transmission [261,262,263,264] and NMDA receptor–mediated signaling [146] for cue-triggered instrumental behavior. Both NAcc and amygdala appear to be essential structures underlying the PIT effect, with double dissociations reported for both regions. Whereas the NAcc core [265] and the central nucleus of the amygdala [242, 266, 267] mediate general PIT, the NAcc shell [265, 268] and basolateral amygdala [242, 269] mediate outcome-specific transfer. Moreover, research suggests that the basolateral amygdala encodes S-O associations and relays this information to the NAcc to mediate goal-directed behavior [270]. The ventral tegmental area [241, 271], DLS [272], infralimbic [273] and medial PFC [274], and OFC [274, 275] have been related to both general and/or outcome-specific PIT effects. Indeed, outcome-specific PIT is thought to rely on the interactions between medial ventral pallidum, NAcc shell, mediodorsal thalamus, and VTA [271, 276], and inactivation of projections from basolateral amygdala to OFC impairs outcome-specific PIT, suggesting they enable cue-triggered reward expectations that drive goal-directed behavior [277].

Neuroimaging studies in humans have reported the involvement of similar structures as rodent research. PIT-related activation has been found within NAcc, putamen, insula, and amygdala [238, 248, 278,279,280,281]. Moreover, dopamine depletion [282] and dopamine antagonists [283] have been shown to reduce the influence of appetitive Pavlovian cues on instrumental responses.

Findings in Alcohol Use Disorder and At-Risk Populations

Preclinical research has demonstrated that alcohol-predictive cues produce a general PIT effect, increasing performance to obtain both alcoholic and non-alcoholic outcomes [240, 284,285,286]. However, others have reported alcohol-specific PIT effects when both the alcoholic and non-alcoholic outcomes are concurrently available during the transfer test [287, 288]. The influence of Pavlovian cues on instrumental responding has been shown to increase with longer instrumental training [284], but can be abolished if Pavlovian extinction is received prior to the transfer test [287]. As with primary reinforcers, rodent research has shown that the reconsolidation of CS-alcohol memories underlying PIT is mediated by NMDA-receptor neurotransmission [289], and differentiated roles for NAcc shell and core, with core inactivation reducing the general PIT effect induced by alcohol and shell inactivation selectively reducing outcome-specific PIT [285].

Only a few studies have investigated PIT in AUD patients (Table 1). Garbusow et al. [290, 291] investigated general PIT with both monetary (non-drug) and alcohol-related cues, observing enhanced non-drug PIT effects in AUD patients. Specifically, patients failed to inhibit approach behavior when simultaneously confronted with positively valued CS, an effect that was even more pronounced in impulsive AUD patients compared with non-impulsive patients and impulsive healthy controls [292]. This effect further predicted relapse at 1-year follow-up [293]. Non-drug PIT effects were associated with NAcc activity, predicting relapse at 3-month follow-up [291]. Similarly, high- compared with low-risk social drinkers also showed stronger non-drug PIT effects, associated with amygdala activation and a cumulative genetic risk for alcohol-related problems [294•].

Conversely, AUD patients compared with healthy controls had a lower general PIT effect elicited by alcohol versus water cues [292, 295], suggesting an inhibition of instrumental performance during alcohol-associated trials. Interestingly, this effect was associated with NAcc activity, and both behavioral and neural effects were driven by patients classified as abstainers at follow-up, who displayed increased NAcc activity at 6-week follow-up and stronger behavioral inhibition at 6-month follow-up compared with both healthy controls and individuals that relapsed to alcohol [295]. This rather surprising result dovetails with a study reporting enhanced NAcc and mPFC activation elicited by alcohol versus neutral cues in patients who relapsed at 3-month follow-up [220]. These results are further substantiated by a multivoxel classification scheme showing that alcohol PIT-related mPFC activity predicted relapse in AUD patients and alcohol intake in social drinkers at 1-year follow-up [296•]. In contrast, studies using neutral primary reinforcers (water, chips, snacks) in PIT tasks have found no differences between AUD patients and healthy controls [75, 297]. Research in subclinical populations, including social drinkers, has found no association between alcohol-specific PIT effects and drinking measures (e.g., AUDIT scores) [298,299,300].

Taken together, these results suggest that strong general PIT effects on approach behavior may increase the risk for alcohol intake in at-risk and AUD participants. However, PIT tasks that use alcohol cues have revealed group differences among AUD patients, with individuals that are able to abstain displaying an inhibitory effect of alcohol cues on approach behavior, whereas individuals that subsequently relapse do not differ from healthy controls. This is in line with the hypothesis that PIT effects may be modulated by goal-directed control [297].

Discussion and Outlook

Basic learning mechanisms, including Pavlovian and instrumental processes, are crucial to understand the development and maintenance of AUD. Preclinical research has extensively demonstrated that alcohol and alcohol-paired cues heavily influence behavior and induce long-lasting changes in brain circuitry. Animal models evidence that alcohol seeking starts as goal-directed behavior, driven by the DMS [21, 72], but through overtraining becomes consistent with an S-R habit, with behavior that will persist despite negative consequences and is driven by the DLS [21, 60,61,62, 73]. Moreover, ethanol infusions cause animals to develop conditioned place preference [183,184,185] and pairing neutral cues with alcohol outcomes induces sign-tracking, whereby animals interact preferentially with the Pavlovian cues [174,175,176,177,178], a behavior that also increases operant responding for alcohol [174, 179]. Indeed, alcohol-predicting Pavlovian cues have been shown to elicit a general PIT effect, increasing responses to obtain both alcoholic and non-alcoholic outcomes [240, 284,285,286]. As with instrumental conditioning, these Pavlovian processes depend heavily on striato-limbic circuitry, especially NAcc and amygdala [185, 193, 197, 285], as well as NMDA receptor and dopaminergic signaling [178, 182, 185, 194, 289].

Translation of animal research to humans remains, however, challenging, with less clear-cut findings than animal studies due, at least in part, to heterogeneity in the paradigms used, employed measures of conditioning (implicit physiological, explicit self-report or neuronal), level of awareness about the conditioning procedure and modifying factors, such as comorbidities, AUD severity and duration, or context (e.g., enrollment in treatment programs or recent detoxification). This notwithstanding, alcohol cues have been extensively proven to induce attentional and psychophysiological changes and increase craving in individuals with AUD [202,203,204, 206,207,208,209,210,211,212,213,214,215], as well as social drinkers [223,224,225,226,227,228]. However, in contrast to animal research, conditioning of alcohol cues in human studies has usually taken place outside of the experimenter’s control and is thus subject to an indeterminate number of potential confounders. Still, recent attempts at de novo Pavlovian conditioning with alcohol, a procedure more similar to preclinical methods, have proven successful, underlining the role of a number of limbic and prefrontal structures [235] and demonstrating alcohol-induced conditioned place preference in social drinkers [232•].

In the instrumental domain, individuals with AUD exhibit decreased goal-directed/model-based control [74•, 77], as well as decreased mPFC and increased dorsal striatal activity in AUD participants [74•, 78••], consistent with rodent studies showing that ethanol exposure results in fronto-striatal changes that contribute to the loss of goal-directed control [67]. However, not all studies have observed differences in model-based versus model-free control [78••, 79]. Translation of research in animals to humans remains a major challenge also in instrumental learning studies. The overtraining-induced shift from goal-directed to habitual control found in rodents [11, 113, 301] has proven elusive in humans [302] (although see Hardwick et al. [303] for a novel design to test habitual responding in humans after overtraining). Moreover, some authors [26, 304] have questioned the suitability of the formalization of habit and goal-directed processes as model-free and model-based control [27, 32] for the study of habit behavior in humans at all. In light of these issues, alternative computational architectures have been proposed [for a review of other taxonomies, see 16] that might better align with classical behavioral findings across species. Furthermore, the recent back-translation of the two-step task for rodents [33,34,35,36,37] could also shed light on these issues. Indeed, translation of models and methodologies between preclinical and clinical research is crucial in the study of basic learning mechanisms in AUD. In recent years, the few human PIT studies exploring AUD have started to elucidate the influence of Pavlovian cues on instrumental behavior, revealing stronger PIT effects in AUD patients [290,291,292, 295]. Recent lines of research that have started incorporating animal concepts, e.g., sign- versus goal-trackers, into human studies [121, 122••] will certainly promote understanding of the relationship between Pavlovian and instrumental behavior.

The studies detailed above have further started to unearth the complex relationship between these basic learning processes and abstinence and relapse. Indeed, cue reactivity as reflected by psychophysiological changes and VS activation has been related to the probability of relapse at follow-up [206, 213, 215, 219, 220]. Similarly, model-based control, together with alcohol expectancies, could be both a predictor of long-term abstinence [78••] and improve with long-term abstinence [79]. PIT-associated fronto-striatal changes have also been related to relapse risk [291, 295, 296•]. Animal models seem here particularly relevant, as they could help tease out the potential components influencing the long-term maintenance of abstinence and have already demonstrated how treatments influencing fronto-striatal function can improve goal-directed control [21, 67, 73]. Preclinical models are also irreplaceable in order to study the influence of age and gender in AUD development, a field that human research can only explore retrospectively, but in which rodent studies have already provided powerful insights [69,70,71, 180,181,182].

A better understanding of the outlined instrumental and Pavlovian learning processes involved in the development, maintenance, and relapse of AUD ultimately holds promise to improve individualized treatment options for this disorder. Novel intervention strategies that target automated approach tendencies and potentially also craving elicited by Pavlovian conditioned cues include cognitive bias modification [305,306,307], pharmacological adjuncts to boost cue exposure therapy [308, 309], or techniques focusing on reconsolidation processes [310,311,312] (reviewed in detail by Beck et al. in this issue).

In order to reach this goal, research should aim for longitudinal studies using reliable and ecologically valid paradigms of Pavlovian and instrumental processes with alcohol-related cues and outcomes, which should be combined with state-of-the-art imaging techniques, computational modeling, and ecological momentary assessment methods that collect real-time craving and substance use data in daily life [313]. This will allow us to better understand how these basic learning mechanisms contribute to the initial development and maintenance of AUD.