Introduction

Cocaine dependence is characterized by compulsive drug use and maladaptive decision-making. Adolescents are particularly vulnerable to the effects of cocaine; for example, cocaine exposure during adolescence reduces treatment seeking in individuals with substance use disorders across the lifespan [1]. It is thus critically important to understand how developmental cocaine exposure impacts complex decision-making, and to develop strategies to normalize abnormal decision-making.

The orbitofrontal cortex (OFC) is thought to build “task spaces,” for example allowing organisms to select optimal actions based on expected outcomes [2]. As such, OFC inactivation impairs goal-directed response choice, resulting in a deferral to familiar, habit-like behaviors that are insensitive to goals [3,4,5]. The OFC is also implicated in addiction etiology [6, 7]. For example, the OFC is hypoactive in cocaine users during protracted withdrawal [8]. In rodents, cocaine eliminates dendritic spines—the primary sites of excitatory plasticity in the brain—in the OFC [9,10,11]. When cocaine is administered during adolescence, dendritic spine and dendrite arbor losses in the OFC are detectable in adulthood [9, 11, 12]. Adolescent cocaine exposure also decreases brain-derived neurotrophic factor (BDNF) protein and mRNA in some regions of the rodent frontal cortex [13]. These same measures are difficult to collect in humans, but notably, low levels of blood BDNF are associated with treatment noncompliance in crack-cocaine users [14]. Drugs that target neurotrophin systems, which regulate neuronal structure and synaptic plasticity [15], may be therapeutically useful. Supporting this notion, an antioxidant that stimulates the high-affinity receptor for BDNF, termed 7,8-dihydroxyflavone (7,8-DHF) [16], improves action-outcome memory in mice [17].

Here, we examined whether 7,8-DHF would correct cocaine-induced decision-making biases. We also explored the utility of 3,4-methylenedioxymethamphetamine (MDMA). We were motivated by our evidence that MDMA increases cortical BDNF, described in this report. Meanwhile, reducing BDNF/trkB expression and activity occludes action-outcome conditioning, causing the same decision-making biases as adolescent cocaine exposure. Our findings suggest that OFC BDNF-trkB is necessary for goal-directed action selection, supporting action-outcome memory necessary for optimal goal attainment. Neurotrophin-based therapeutics may ultimately be useful for enhancing adaptive decision-making following adversities experienced during adolescence.

Methods

Subjects

Male and female C57BL/6 mice or mice homozygous for a “floxed” Bdnf gene (exon V) bred on a mixed BALB/c background (Jackson Laboratories, Bar Harbor, ME) were used. Mice were kept on a 12-h light/dark cycle (0800 on) and given ad libitum access to water and food, except during instrumental conditioning when animals were food restricted to ~ 90% of their baseline weight. All studies were approved by the Emory University Institutional Animal Care and Use Committee.

Drugs

Cocaine HCl (10 mg/kg, i.p., in saline, Sigma-Aldrich, St. Louis, MO) was delivered from postnatal days (P) 31–35. 7–8-DHF (5 mg/kg, i.p., in 17% DMSO and saline; TCI Chemicals, Portland, OR) or MDMA HCl (5.6 mg/kg, i.p., in saline, generously provided by NIDA) was delivered in adulthood (>P60). Drugs were administered in a volume of 1 mL/100 g.

Intracranial Surgery

Mice were anesthetized with ketamine/dexdomitor. With needles centered at Bregma on a leveled skull, coordinates were located using a digitized stereotaxic frame (Stoelting, Wood Dale, IL). Lentiviral vectors expressing Green Fluorescent Protein (GFP), Cre recombinase (Cre), or TrkB.t1 under the CMV promoter were generated by the Emory University Viral Vector Core. Viral vectors were delivered to the ventrolateral OFC at + 2.6 mm anteroposterior (AP), − 2.8 or − 2.85 mm dorsoventral, and ± 1.2 mm mediolateral (ML) relative to Bregma (coordinates based on [18]; see [17, 19]) in a volume of 0.25 (for Bdnf knockdown), 0.5 (for TrkB.t1 overexpression), or 1.5 (for TrkB.t1 overexpression) μL/side, as indicated. Throughout, infusions were delivered at 0.1 μL/min, and the microsyringe was left in place for 5 (0.25 and 0.5 μL total volume) or 12 (1.5 μL total volume) min before withdrawal and suturing. Following surgery, mice were left undisturbed for 2–3 weeks.

Food-Reinforced Instrumental Conditioning

Mice were food restricted and maintained at ~ 90% baseline weight for > 1 week prior to instrumental conditioning. Mice were trained to respond for food reinforcement (20 mg grain-based pellets; Bioserv, Frenchtown, NJ) in Med-Associates (Georgia, VT) operant conditioning chambers. Chambers were equipped with 2 nose poke recesses and a food magazine. Thirty reinforcers were available for responding on each of 2 nose poke recesses (60 pellets/session). Sessions lasted for 70 min or until 60 pellets were delivered. Males were trained using a fixed ratio 1 (FR1) schedule of reinforcement, for as many sessions as required to earn all available reinforcers in a given session (5–10 sessions across all experiments; the final 5 sessions are reported). Next, mice were shifted to a random interval 30-s (RI30) schedule for 2 sessions. Female mice can develop food-reinforced habits faster than males (see [20], consistent with our own experience), so we omitted RI training, which can promote habit-based behavior. Instead, female mice were trained using 10 FR1 sessions.

Action-outcome contingency degradation was conducted as previously described [11, 17, 21]. Briefly, in the “nondegraded” session, one nose poke aperture was occluded, and responding on the other aperture was reinforced according to a variable ratio 2 (VR2) schedule for 25 min. In the “degraded” session, the previously reinforced nose poke aperture was occluded, and pellets were delivered into the magazine noncontingently at a rate matched to the reinforcement rate from the previous session. Responding on the available nose poke recess was recorded but had no programmed consequences, thus violating the contingency between nose poking and reinforcement. Drugs (7,8-DHF or MDMA) were administered immediately following the “degraded” session during the presumptive period of action-outcome memory formation or updating. The location of the “degraded” aperture was counterbalanced between and within groups.

The next day, both apertures were available during a 5–10-min probe test conducted in extinction. Goal-directed decision-making is reflected by preferential responding on the “nondegraded” aperture (i.e., generating the response that was most likely to be reinforced), and equivalent responding is considered a failure in action-outcome conditioning, a deferral to familiar habit-based behavior.

Locomotor Activity Following 7,8-DHF and Cocaine

Behaviorally naïve mice were placed in Med-Associates locomotor monitoring chambers equipped with 16 photocells. After 1 h of habituation, mice were injected with either vehicle or 7,8-DHF to determine whether 7,8-DHF modified locomotor activity. This process was repeated 3 times/week for 14 total sessions. Then, mice were left undisturbed for 10–14 days. To test for potential cross-sensitization between repeated 7,8-DHF and cocaine, mice were then returned to the locomotor monitoring chambers. Locomotor activity was measured for 1 h for habituation. All mice were then injected with saline and monitored for 1 h, followed finally by an injection of cocaine. Again, locomotor activity was monitored for 1 h. Photobeam breaks were compared between groups.

Locomotor Activity Following MDMA

To examine whether acute MDMA modified locomotor activity, naïve mice were placed in locomotor monitoring chambers equipped with 16 photocells (Med-Associates) to habituate for 30 min. Then, mice were administered saline or MDMA (counterbalanced across 2 days) and immediately placed back in the chambers for 60 min. The test was repeated so all mice ultimately received both saline and MDMA. Photobeam breaks were compared between groups.

Marble Burying

Behaviorally naïve mice were tested for anxiety-like behavior in a dimly lit room. Saline or MDMA was administered, in a counterbalanced order across 2 days, 30 min before testing. Mice were then placed in 11.5 × 14 × 7.5 in chambers with 3–4 in of bedding and 20 marbles arranged in a grid. The number of marbles buried (greater than ½ covered) was counted at 30 min. The test was repeated so all mice ultimately received both saline and MDMA.

Open Field Test

To examine whether MDMA changed another measure of anxiety-related behavior, locomotor activity in a novel open field was measured. Naïve mice were administered saline or MDMA, counterbalanced across 2 days. Thirty minutes later, mice were placed into 45 × 39 × 37 cm open field chambers with 16 × 16 photocells positioned 2.5 cm off the chamber floor (San Diego Instruments, San Diego, CA). Beam breaks along the periphery (defined as within 2 in of the outer edge of the chamber) and center (defined as a 3 in square in the center of the chamber) were counted for 10 min, and the ratio of time spent in the periphery vs. the center was calculated. The test was repeated so all mice ultimately received both saline and MDMA.

Immunohistochemistry

Histology

Mice were euthanized by rapid decapitation, and brains were immersed in 4% paraformaldehyde for 48 h and then 30% w/v sucrose for at least 72 h prior to being sectioned at 50 μm. Infusion sites were verified by immunostaining for Cre (as in [22], with the primary antibody concentration at 1:750; Sigma-Aldrich) or the HA tag on the TrkB.t1 virus (as in [23], with the primary antibody concentration at 1:1000; Sigma-Aldrich). Alternatively, GFP or mCherry was visualized. Sections were mounted and coverslipped with Permount mounting medium (Thermo Fischer Scientific, Waltham, MA) and imaged.

Quantitative Imaging

Sections were immunostained for the HA tag (as above) or glial fibrillary acidic protein (as in [24], with the primary antibody concentration at 1:1000; DakoCytomation, Carpinteria, CA). Sections were imaged on a Nikon 4550 s SMZ18 stereo microscope (Nikon Instruments, Melville, NY). All images were collected in the same session with settings held constant. A sampling area was drawn around the infusion site, and the mean integrated intensity was quantified in NIS Elements (Nikon Instruments).

Western Blotting

Mice were euthanized by rapid decapitation 2 h following MDMA or 30 min, 2 h, or 8 h following 7,8-DHF administration (as indicated). Brains were flash frozen on dry ice and stored at − 80 °C. Brains were sectioned into 1-mm-thick coronal sections, and a single experimenter dissected tissue from the OFC, medial prefrontal cortex (mPFC), and dorsal striatum using a 1 mm tissue corer. Samples were sonicated in lysis buffer [200 μL; 137 mM NaCl, 20 mM tris-HCl (pH = 8), 1% igepal, 10% glycerol, 1:100 Phosphatase Inhibitor Cocktails 1 and 2 (Sigma-Aldrich), and 1:1000 Protease Inhibitor Cocktail (Sigma-Aldrich)]. Protein concentrations were determined using a Pierce BCA Protein Assay kit (Thermo Fisher Scientific) following manufacturer’s instructions.

Fifteen or 20 μg of each sample was separated by SDS-page on 7.5% or 4–20% gradient Tris-glycine gels (Bio-Rad Laboratories, Inc., Hercules, CA) and then transferred to PVDF membranes (Bio-Rad). Blots were blocked in 5% nonfat dry milk for 1 h. The primary antibodies used were Extracellular-signal Regulated Kinase 1/2 (ERK1/2) (1:2000; Cell Signaling Technology, Danvers, MA), phospho-ERK1/2 (1:500, Cell Signaling Technology), BDNF (1:250; Abcam), trkB (1:375; Cell Signaling Technology), phospho-trkB (Tyr 706/707) (1:250; Cell Signaling Technology), and HSP70 (1:6000; Santa Cruz Biotechnology, Dallas, TX). Membranes were incubated in primary antibody overnight at 4 °C and then at room temperature for 1 h in secondary antibody goat anti-mouse or anti-rabbit peroxidase labeled IgG (1:10,000; Vector Laboratories, Burlingame, CA). Bands were measured using a chemiluminescence substrate (Thermo Fisher Scientific) and a ChemiDoc MP Imaging System (Bio-Rad). Values were normalized to the HSP70 loading control and then to the control mean on each gel to control for variation across membranes. Phosphorylation values were normalized to the corresponding total protein.

Enzyme-Linked Immunosorbent Assay

Tissue samples were homogenized as for western blots (above). BDNF was quantified using a BDNF enzyme-linked immunosorbent assay (ELISA) kit (Promega Corporation, Madison, WI) following manufacturer’s instructions, excluding the extraction step. Samples were tested in duplicate and normalized to the control mean on each independent plate.

Statistical Analyses

Statistical analyses were performed using SPSS with α ≤ 0.05. Response rates and locomotor counts were compared by two- or three-factor ANOVA with repeated measures when appropriate. ANOVA or t tests were used to compare group means from western blots, ELISAs, and the marble burying assay. If data had unequal variance (according to Levene’s test for equality of variance), then Welch’s t test was used to compare group means. In the case of significant interactions, Bonferroni or uncorrected post hoc t test comparisons were used. All significant post hoc comparisons are indicated graphically. Throughout, values > 2 standard deviations from the mean were considered outliers and excluded.

Results

Adolescent Cocaine Exposure Has Long-Term Behavioral Consequences

Mice were exposed to cocaine from P31–35, selected because cocaine exposure during this brief window weakens sensitivity to action-consequence contingencies in adulthood, causing a deferral to familiar, habit-based behaviors [11]. Consistent with our prior investigations in males, adult mice of both sexes with a history of cocaine exposure were able to develop food-reinforced nose poke responses, with no differences between groups (both F < 1) (Fig. 1A–C, E). Next, the pellet associated with one of the trained responses was delivered noncontingently, whereas the other nose poke response remained reinforced. In a probe test conducted the next day, control mice preferentially generated the nose poke that remained reinforced—evidence of goal-directed decision-making—whereas cocaine-exposed mice failed to differentiate between behaviors—evidence of habit-like behavior (interaction males F1,21 = 4.918, p = 0.038; females F1,13 = 5.377, p = 0.037) (Fig. 1D, F). Because sex did not impact the effects of adolescent cocaine exposure on decision-making, we focused exclusively on male mice for subsequent experiments.

Fig. 1
figure 1

Mice exposed to cocaine as adolescents are insensitive to changes in action-outcome contingencies as adults. (a) Mice were exposed to cocaine during early adolescence, then tested for sensitivity to action-outcome contingencies in adulthood. (b) Schematic of task. Mice are trained to respond equally on two distinct apertures. Then, the probability of reinforcement greatly decreases for one response, and instead, pellets are delivered noncontingently. Preferential engagement of the response that remains likely to be reinforced (“nondegraded”) is interpreted as goal-directed behavior, whereas equivalent responding is considered habit-based. (c) Male mice with a history of cocaine exposure acquired the nose poke responses. (d) Following instrumental contingency degradation, however, mice with a history of adolescent cocaine exposure were unable to differentiate between responses that were likely, vs. unlikely, to be reinforced (“nondegraded” vs. “degraded”). Instead, these mice engaged two familiar responses equally, habitually. (e) Similarly, female mice with a history of adolescent cocaine exposure acquired nose poke responses, but were (f) unable to modify responding following instrumental contingency degradation, deferring instead to familiar, habit-based strategies. Symbols and bars represent means + SEMs, *p < 0.05

Stimulating trkB Improves Action-Outcome Memory

We hypothesized that drugs that stimulate trkB may restore goal-directed action selection in adult mice exposed to cocaine as adolescents. To this end, we administered the trkB agonist 7,8-DHF immediately following the contingency degradation procedure, during the presumptive updating or solidification of new memory regarding action-outcome contingencies (Fig. 2A). We then assessed response preference the following day. In other words, 7,8-DHF was given only following instrumental contingency degradation, and mice were drug-free during the probe test. As before, response rates did not differ during initial nose poke training (F < 1) (Fig. 2B). And as hypothesized, cocaine weakened sensitivity to action-outcome contingencies, whereas 7,8-DHF intervention restored goal-directed response preference (interaction F1,44 = 6.612, p = 0.014)(Fig. 2C).

Fig. 2
figure 2

The trkB agonist 7,8-DHF strengthens action-outcome memory. (a) We exposed mice to cocaine during early adolescence, then tested for sensitivity to action-outcome contingencies in adulthood. 7,8-DHF was given immediately following instrumental contingency degradation, then response preference was assessed the following day when mice were drug-free. (b) Mice acquired the instrumental responses without group differences. (c) Developmental cocaine exposure induced failures in action-outcome decision-making, resulting in nonselective responding, although 7,8-DHF corrected these failures. (d) 7,8-DHF increases trkB phosphorylation in the OFC and mPFC, and (e) effects are rapid, not detectable 2 or 8 h later. Representative blots are above. (f) Experimental timeline: Next, we delayed the probe test by one week to assess whether 7,8-DHF facilitated long-term memory. (g) Developmental cocaine exposure induced failures in action-outcome decision-making, as expected, resulting in nonselective responding, although 7,8-DHF corrected these failures. Symbols and bars represent means + SEMs, *p < 0.05

We envision that 7,8-DHF supports new action-consequence memory. Implicit in this assumption is that 1) 7,8-DHF stimulates trkB in the minutes-to-hours after familiar contingencies are violated and 2) new action-outcome associations should be durable, detectable well after 7,8-DHF exposure. To explore these possibilities, we conducted two additional experiments: First, we administered 7,8-DHF to naïve mice and collected brains. Phosphorylated (active) trkB was elevated across mPFC and OFC samples 30 min later (main effect of 7,8-DHF F1,7 = 6.776, p = 0.035; no interaction) (Fig. 2D). When we collected OFC samples 2 and 8 h after injection, groups did not differ (main effect F2,22 = 1.9, p = 0.173) (Fig. 2E), suggesting that 7,8-DHF stimulates trkB during a period when mice are presumably solidifying new action-outcome memories, rather than during the probe test conducted 24 h later when they can use new action-outcome information to make decisions.

We next administered 7,8-DHF to another group of trained mice, exactly as above, except mice were then left undisturbed for 1 week between 7,8-DHF injection and the probe test, rather than 1 day (Fig. 2F). A history of cocaine exposure weakened action-outcome decision-making as expected. Meanwhile, 7,8-DHF-treated mice (±cocaine) generated strong response preferences, reflecting long-term action-outcome memory (interaction F1,38 = 5.113, p = 0.030) (Fig. 2G), which is OFC-dependent (see [17, 19]). Notably, control, vehicle-injected mice were more variable in their responding than in our experiments above, with only 7/11 mice generating a response preference and post hoc comparisons not reaching statistical significance. This pattern potentially indicates that memory for action-outcome relationships can markedly weaken within 1 week.

Some evidence suggests that stimulating trkB could heighten sensitivity to drug “reward” (see “Discussion”). Locomotor sensitization is a widely used measure of reward-related plasticity. We next tested whether systemic 7,8-DHF administration—as used here—induced sensitization or cross-sensitization with cocaine, referring to the amplification of a locomotor response to a drug (in this case, cocaine) due to prior experience with a different compound (in this case, 7,8-DHF). Vehicle or 7,8-DHF was administered repeatedly to otherwise naïve mice, and locomotor activity was monitored. Activity was higher during the daily habituation periods than following injections (main effect F1,78 = 131.5, p < 0.001), but we detected no drug effects (F13,78 = 1.2, p = 0.24) (Fig. 3A). Following an extended drug washout period, mice were again habituated to the chambers, then administered saline, and finally, cocaine. Although cocaine elicited more locomotor activity as expected (main effect F2,12 = 24.6, p = 0.003), we found no other significant differences between 7,8-DHF and vehicle groups (interaction F2,12 = 6.3, p = 0.04; all post hoc comparisons p > 0.055) (Fig. 3B). In other words, repeated 7,8-DHF does not induce locomotor sensitization, nor does it affect subsequent locomotor sensitivity to cocaine.

Fig. 3
figure 3

Repeated 7,8-DHF does not affect locomotion or locomotor sensitivity to cocaine. (a) Mice were repeatedly injected with vehicle or 7,8-DHF following a habituation period. Locomotor activity was higher during the habituation periods than following drug or vehicle injection, but we detected no effects of drug or interactions between day and injection. (b) Mice were allowed a washout period, followed by a (re-)habituation period, then saline and cocaine injections. Although cocaine elicited more locomotor activity overall, we found no other significant differences. In other words, 7,8-DHF did not influence locomotor sensitivity to cocaine. Means + SEMs, *p = 0.03 saline vs. cocaine, **p < 0.001 activity counts during habituation vs. activity counts following injection

MDMA Stimulates OFC BDNF and Enhances Action-Outcome Memory

One adverse consequence of systemic trkB activation is pain hypersensitivity [25], likely making trkB agonists therapeutically unviable. We were thus interested in examining other drugs that might stimulate BDNF-trkB with greater brain region specificity. MDMA reportedly increases PFC BDNF [26]. We found using ELISA that MDMA increased BDNF in the OFC, though not mPFC, while reducing BDNF in the dorsal striatum (interaction F2,45 = 3.36, p = 0.04) (Fig. 4A, B; brain regions are represented separately for clarity). Using western blotting, we found that MDMA increased both the mature and pro forms of BDNF (mature t8 = − 1.984, p = 0.042; pro-BDNF t6.38 = − 2.435, p = 0.024) (Fig. 4C, D) and ERK1/2 phosphorylation (t7 = 2.333, p = 0.05) (Fig. 4E) in the OFC.

Fig. 4
figure 4

MDMA stimulates OFC BDNF-ERK1/2 and action-outcome memory. (a) MDMA increased BDNF in the OFC, as quantified by ELISA. Inset: MDMA did not increase BDNF in the mPFC. (b) MDMA also decreased BDNF in the dorsal striatum. (c) Western blotting revealed increased mature BDNF and (d) pro-BDNF protein in the OFC, concurrent with (e) elevated phospho-ERK1/2. (f) Experimental timeline. Bdnf was reduced using Cre-expressing viral vectors in “floxed” Bdnf mice. Mice were then tested for sensitivity to action-outcome contingencies. MDMA was given immediately following instrumental contingency degradation, then response preference was assessed the following day when mice were drug-free. (g) Representation of viral vector spread on images from the Mouse Brain Library [27] are shown, with white representing maximal spread and black the smallest spread. “VLO” refers to the ventrolateral OFC. (h) Infusion of Cre-expressing viral vectors decreased BDNF in gross OFC tissue punches. Incomplete protein loss was expected, given that tissue punches contained both infected and uninfected tissues. (i) Mice acquired the instrumental responses with no group differences. (j) Bdnf knockdown induced failures in action-outcome conditioning, resulting in nonselective, habit-based responding, although MDMA corrected these failures. Symbols and bars represent means + SEMs, *p < 0.05

To determine whether the pro-neurotrophic effects of MDMA could influence decision-making strategies, we next confirmed that MDMA could normalize response abnormalities due to Bdnf deficiency—i.e., by reinstating BDNF tone. We selectively reduced Bdnf in the ventrolateral OFC using viral-mediated gene silencing, resulting in a ~ 13% loss of BDNF protein in gross OFC tissue punches containing both infected and uninfected tissue (t16 = 2.368, p = 0.031) (Fig. 4F–H). We then trained mice to respond for food reinforcers (F < 1) (Fig. 4I) and tested sensitivity to instrumental contingency degradation. Bdnf knockdown impaired action-outcome decision-making, causing a bias towards habit-based behaviors, as expected (see [17, 28]). Meanwhile, MDMA administered in conjunction with instrumental contingency degradation stimulated action-outcome memory, causing significant response preferences during a subsequent probe test (interaction F1,26 = 5.233, p = 0.044) (Fig. 4J). Thus, MDMA can normalize action-outcome decision-making following Bdnf knockdown.

We hypothesized that MDMA enriches action-outcome decision-making by increasing OFC BDNF, which presumably strengthens action-outcome memory by stimulating local trkB. To test this perspective, we infused a lentivirus expressing a truncated, inactive isoform of trkB (TrkB.t1) and an HA tag into the ventrolateral OFC (Fig. 5A). We aimed to sequester BDNF and dampen trkB-mediated intracellular signaling (see viral vector validation in [29]). We infused two volumes to vary the degree of TrkB.t1 overexpression. Histological analyses indicated that viral vector spread did not differ between 0.5 and 1.5 μL groups (Fig. 5B, C), as expected based on the limited mobility of lentiviral vectors in the brain. The 1.5 μL infusion generated greater HA immunoreactivity (t14 = − 2.672, p = 0.018) (Fig. 5D), however, indicating a higher density of viral infection within the OFC, also as expected. Levels of GFAP (an astrocytic marker) did not differ (t13 = 1.258, p = 0.23) (Fig. 5E), indicating that the 1.5 μL infusion did not cause greater tissue damage.

Fig. 5
figure 5

MDMA strengthens action-outcome memory via trkB in the OFC. (a) Constructs for lenti-TrkB.t1 and lenti-GFP viral vectors. (b) Lenti-TrkB.t1 and lenti-GFP viral vectors were infused in 0.5 μL or 1.5 μL volumes. Representation of viral vector spread on images from the Mouse Brain Library [27] are shown, with white representing maximal spread and black the smallest spread. (c) We detected no difference in viral vector spread, indicating both infusions remained highly contained within the ventrolateral OFC, but (d) the groups differed in HA immunofluorescence, with the larger volume generating greater fluorescence, indicating greater infection. Inset: Representative HA immunofluorescence. “VLO” refers to the ventrolateral OFC. (e) GFAP levels did not differ between groups, suggesting that the larger infusion volume did not cause greater tissue damage. (f) Mice acquired the instrumental responses with no group differences. (g) Both low and high volumes of lenti-TrkB.t1 caused failures in action-outcome conditioning. Impairments caused by the lower TrkB.t1 volume were corrected by MDMA, but larger-volume TrkB.t1 overexpression caused failures that were insensitive to MDMA-mediated blockade. Symbols and bars represent means + SEMs, *p < 0.05

Groups did not differ during instrumental response training (F < 1) (Fig. 5F), including the 0.5 and 1.5 μL control groups, which were combined. As with Bdnf knockdown above, TrkB.t1 overexpression impaired action-outcome response choice, causing a deferral to habit-based behavior. Meanwhile, MDMA corrected decision-making strategies in the 0.5, but not 1.5, μL group (interaction F2,32 = 7.792, p = 0.002) (Fig. 5G). This pattern suggests that MDMA strengthens action-outcome memory by stimulating trkB-mediated intracellular signaling locally in the OFC, thus overcoming the effects of TrkB.t1 overexpression. And when the OFC is saturated with TrkB.t1, MDMA is unable to exert its behavioral effects.

Finally, we tested whether MDMA could improve action-outcome memory in mice exposed to developmental cocaine (Fig. 6A). We again administered MDMA immediately following contingency degradation training, then tested response preference the next day, when mice were drug-free. Again, developmental cocaine exposure did not affect response acquisition (F < 1) (Fig. 6B), but did impair action-outcome decision-making and, as hypothesized, MDMA rescued goal-directed action selection following developmental cocaine exposure (interaction F1,42 = 4.166, p = 0.048) (Fig. 6C).

Fig. 6
figure 6

Acute MDMA improves action-outcome memory following cocaine. (a) Experimental timeline. As before, mice were exposed to cocaine during early adolescence, then tested for sensitivity to action-outcome contingencies in adulthood. MDMA was given immediately following instrumental contingency degradation, then response preference was assessed the following day when mice were drug-free. (b) Response rates did not differ between groups during instrumental response acquisition. (c) Developmental cocaine exposure induced failures in action-outcome decision-making, resulting in nonselective, habit-like responding, although MDMA corrected these failures. Symbols and bars represent means + SEMs, *p < 0.05

Additional experiments utilizing this dose of MDMA revealed modest decrements in locomotor activity, however, they were transient and would not be expected to affect responding during the probe test, occurring the day after injection (interaction F7,170 = 3.938, p < 0.001) (Fig. 7A). MDMA also decreased marble burying, considered a measure of compulsive- and anxiety-like behavior (t10 = 3.428, p = 0.006) (Fig. 7B), whereas exploration of an open field, another assay of anxiety-like behavior, was not affected (t10 = − 1.78, p = 0.109) (Fig. 7C). Both measures were collected 30 min after injection, when locomotor activity was not affected.

Fig. 7
figure 7

Acute MDMA transiently modifies locomotor activity and decreases marble burying. (a) MDMA transiently decreased, and then increased, locomotion. (b) MDMA also decreased marble burying, detectable 30 min after treatment, but (c) did not change the ratio of locomotion in the periphery vs. center of an open field. Bars and symbols represent means + SEMs, *p < 0.05

Altogether, our findings suggest that low-dose MDMA enriches action-outcome memory, including in organisms previously exposed to cocaine, enabling optimal goal-sensitive response selection. MDMA also decreases compulsive-like behavior (marble burying). Thus, we cautiously suggest that MDMA may be an effective adjunct to behavioral therapy for cocaine-exposed individuals seeking to maintain a drug-abstinent lifestyle.

Discussion

Here, we utilized an instrumental contingency degradation procedure to determine whether mice selected actions based on their consequences. Mice were trained to respond equally on two nose poke recesses for food reinforcers, then the reinforcer associated with one response was provided noncontingently. A goal-oriented response strategy is to preferentially generate the response that remains reinforced and requires action-outcome memory. Meanwhile, equivalent responding reflects a deferral to familiar, habit-like response patterns requiring no new memory [30]. Cocaine exposure during early adolescence (P31–35) interfered with action-outcome decision-making in adult male and female mice. This pattern is consistent with our prior investigations, in which male mice subjected to the same subchronic cocaine exposure in adolescence, but not adulthood, developed habit biases [11]. Our focus on adolescence is motivated by evidence that most drug-addicted adults initiated drug use during this time [31], but we emphasize that chronic exposure to cocaine, amphetamine, and other drugs of abuse induces habit-based behaviors in mature rodents as well [32, 33], a phenomenon that in humans may preclude goal-oriented treatment seeking.

Blockade of Cocaine-Induced Habits

Cocaine impacts multiple brain regions involved in complex decision-making, including the OFC. For example, the OFC atrophies in cocaine abusers [34,35,36] and the amount of hypoactivity and cell loss in the OFC is commensurate with the extent of cocaine use [34, 37, 38]. Additionally, in cocaine self-administering rodents, dendritic spines are eliminated in the OFC [10]. Adolescent cocaine exposure causes lasting changes in neuron structure, decreasing dendritic spine densities [9, 11] and simplifying dendrite arbors [12]. Cocaine-induced changes in dendrite structure in the ventrolateral subregion of the OFC may be causally related to impairments in action-outcome decision-making, given that drugs that correct cocaine-induced habit biases appear to recruit local cytoskeletal regulatory systems [11].

Neurotrophins are potent regulators of neuronal structure. For example, BDNF increases dendritic spine size and density by activating its high-affinity receptor, trkB [39], which is also associated with dendritic spine homeostasis in cocaine-exposed mice, promoting both drug-induced dendritic spinogenesis in the nucleus accumbens, and also recovery of typical spine densities following abstinence [40]. OFC-selective Bdnf knockdown impairs action-outcome conditioning, causing habit-like response biases that are reversible by the small-molecule trkB agonist 7,8-DHF at doses that also stimulate dendritic spinogenesis in the OFC [17]. Based on this profile, we administered 7,8-DHF to cocaine-exposed mice. Cocaine exposure during adolescence caused habit-like response biases in adulthood, as expected, whereas the trkB agonist reinstated goal-directed response choice. Interestingly, cocaine-exposed mice also appeared to respond less overall, a pattern that was also corrected by 7,8-DHF. Low response rates could conceivably be attributable to deliberation-like behavior following cocaine, as recently revealed by Sweis, Redish, and Thomas in a “restaurant row” task in which mice must weigh several costs and benefits in order to obtain food. Cocaine decreased the efficiency of response choice at the expense of animals’ ability to earn their daily food [41].

Despite our evidence that a trkB agonist can correct cocaine-induced decision-making biases, there are several concerns regarding the utility of broadly stimulating BDNF-trkB for therapeutic purposes. In addition to pain hypersensitivity [25], BDNF in different brain regions appears to have opposing effects on cocaine-related behaviors [42, 43]. For instance, a trkB antagonist blocks cue-induced reinstatement of cocaine seeking [44], and this outcome may be attributable to BDNF-trkB activity in the nucleus accumbens, which can elicit cocaine self-administration and cocaine-seeking behavior in some circumstances [45,46,47] (though not others [40]). On the other hand, other evidence suggests that the loss of trkB activity increases motivation for the drug [40], and the same trkB antagonist discussed above paradoxically stimulates trkB phosphorylation in the PFC [44]. PFC BDNF-trkB apparently decreases motivation to take cocaine [48] and facilitates the extinction of cocaine-conditioned place preference [28, 49] and blocks cocaine-induced habits (here). Also, BDNF infusions into the mPFC and systemic 7,8-DHF treatment decrease drug seeking [50,51,52,53], and we report that 7,8-DHF does not affect locomotor sensitivity to cocaine. We suggest that BDNF-trkB stimulation in the PFC could support drug abstinence via learning and memory systems, and potentially other mechanisms including coordinated interactions with neurotrophic systems in nucleus accumbens.

With these patterns in mind, we next determined the effects of MDMA on BDNF in the OFC. We were motivated by evidence that MDMA enhances conditioned fear extinction, a PFC-dependent process [54] and increases PFC Bdnf [26, 55]. Here, MDMA stimulated BDNF in the OFC but not mPFC (see also [54]), indicating that previously reported MDMA-induced elevations in PFC BDNF may be driven by the inclusion of OFC tissues, often considered “prefrontal.” MDMA also increased phosphorylation of ERK1/2, a trkB substrate. This finding is important, given that the BDNF-mediated interference with cocaine seeking depends on trkB-mediated activation of ERK1/2 [52]. Interestingly, MDMA also decreased BDNF in the dorsal striatum, a provocative finding given that dorsal striatal BDNF augments the reinforcing properties of cocaine [56].

Next, we reduced Bdnf selectively in the ventrolateral OFC using viral-mediated gene silencing. As in prior reports, Bdnf knockdown interfered with action-outcome conditioning, inducing habit-based behavior [17, 28]. Meanwhile, pairing MDMA with action-outcome contingency degradation enabled mice to update action-outcome associations, presumably by stimulating BDNF in noninfected neurons. Notably, pairing acute cocaine with instrumental contingency degradation occludes action-outcome association updating [57]—differential effects relative to MDMA, even though both drugs are psychostimulants. Whether and how acute cocaine impacts ventrolateral OFC neurotrophin systems remains unclear. Surprisingly, OFC Bdnf knockdown here did not impact response rates during instrumental response training as was previously reported [17, 22, 28]. This outcome is possibly explained by a longer period of food restriction prior to training, which may have allowed mice to habituate to food restriction stress. Also, BDNF loss here was slightly more modest compared to our prior experiments [17].

MDMA strengthened action-outcome memory, blocking cocaine-induced habit biases, potentially suggesting that MDMA could be used as an adjunct to behavioral therapy for individual struggling with cocaine abuse. Although there is some concern regarding the use of MDMA as a therapeutic compound [58], it apparently has clinical efficacy in treating PTSD [59, 60], and acute administration does not induce obvious harmful effects or trigger recreational usage [60]. Importantly, overexpression of a truncated form of trkB (TrkB.t1) at high titer in the ventrolateral OFC interfered with action-outcome memory, even when MDMA was co-administered, suggesting that the corrective properties of MDMA are OFC trkB-dependent.

Conclusions

The OFC is thought to integrate information about context, stimuli, and goals to predict likely outcomes—particularly when those outcomes are not readily observable [2]. Accordingly, inactivation of the ventrolateral compartment can cause outcome-insensitive, habit-based response strategies in rodents [4, 17, 19, 61,62,63,64] and nonhuman primates [65, 66]. Cocaine users display habit biases in similar tasks, even when responding for nondrug reinforcers, as with our mice here [67]. Experiments using inducible inactivation strategies suggest that excitatory plasticity in the OFC when action-outcome expectations are violated is necessary for optimal decision-making later, even when the OFC is back “on line” [19]. We report that pharmacological strategies that stimulate OFC BDNF-trkB systems when expectations are being violated improve subsequent adaptive decision-making, including following cocaine. Thus, neurotrophin-modulating therapeutics may be effective adjuncts for addicted individuals seeking to maintain a drug-abstinent lifestyle, particularly individuals first exposed to cocaine early in life.