Introduction

Posttraumatic stress disorder (PTSD) is an impairing anxiety-related disorder that is marked by deficits in several aspects of learning (Jovanovic et al., 2010; Jovanovic et al., 2012; Pacella et al., 2013). For example, individuals with PTSD exhibit overgeneralization of conditioned fear from threat-related cues to approximations of threat-related cues that are safe (Kaczkurkin et al., 2017; Lopresto et al., 2016), impaired inhibition of previously learned fear associations, and impaired recall of extinction learning (for a review, see Lissek & van Meurs, 2015). Given the ubiquity of learning deficits in PTSD, identifying specific learning-related processes that are disrupted is an important research endeavor.

Computational methods have enhanced researchers’ ability to circumscribe cognitive processes with greater sensitivity and precision (Price et al., 2019; Stephan & Mathys, 2014). The progression of reinforcement learning (RL) models has been particularly successful, with computational models increasingly capturing behavior and learning phenomena not well explained by prior models (Cochran & Cisler, 2019; Le Pelley, 2004; Mihatsch & Neuneier, 2002; Redish et al., 2007). For example, whereas the standard Rescorla-Wagner (RW) model, which was developed to quantitatively formalize Pavlovian RL, models simple trial-and-error (“model-free”) learning, it does not capture more dynamic processes that occur during conditioning and extinction. Indeed, hybrid Pearce-Hall/RW learning models, which track trial-by-trial associability (i.e., the salience of a cue), provide a better fit for probabilistic (Brown et al., 2018) and fear-related learning (Homan et al., 2019; Li et al., 2011) than the standard RW model among healthy controls and individuals with PTSD.

In addition to data-driven model development, alignment of computational models with psychological theory is paramount (Huys et al., 2016), as this will allow for the testing and refinement of current hypotheses regarding the role of learning in disorders, such as PTSD. Whereas prior research has primarily assessed relatively simple trial-and-error learning in PTSD (Brown et al., 2018; Cisler et al., 2015; Cisler et al., 2019; Ross et al., 2018), few have included models that capture higher-level cognitive processes. Importantly, theories of anxiety and PTSD implicate abstract, higher-order cognition in fear learning and extinction (Dunsmoor & Murphy, 2015). For example, the degree to which learned fear is generalized may depend on individuals’ reasoning about internal conceptual representations (Dunsmoor & Murphy, 2015). In addition to the ability to reason about, integrate, and abstract categorical representations, which is facilitated by the anterior prefrontal cortex (Davis et al., 2017), the abstraction of rules and contextual information is mediated by prefrontal regions (Cools et al., 2004; Fogelson et al., 2009). Evidence of impaired contextual modulation of fear learning among individuals with PTSD further suggests a potentially important role of complex cognitive functions in the acquisition and revision of fear (Steiger et al., 2015).

Unlike model-free (e.g., RW) and hybrid RL, which involve trial-and-error revisions of cue-outcome associations, model-based RL captures structured and dynamically shifting conditions or “rules” of learning (Daw et al., 2005; Redish et al., 2007). Model-based RL is theorized to support the development of internal models (i.e., cognitive maps)Footnote 1 that contain hypotheses about different task conditions to allow a learner to make predictions about future actions within a changing environment (Daw et al., 2005; Gläscher et al., 2010). During a situation in which a stimulus is paired with an aversive outcome (e.g., a circle is paired with a shock), an individual utilizing model-free RL would develop a single outcome expectation for that stimulus (e.g., “circles are dangerous”). By contrast, during a situation in which the pairing of an aversive outcome and a stimulus can change depending on situational factors (i.e., the abstract ‘state’ of the environment), an individual utilizing model-based RL would develop separate outcome expectations for the various conditions and would differentially weight their expectations depending on which situation (i.e., state) was currently relevant (e.g., “circles are dangerous in situation X, but not Y”). In PTSD, difficulty with the latter may contribute to challenges with new safety learning, which is common in PTSD (Fani et al., 2012).

Whereas model-free and hybrid RL are primarily implemented within regions of the ventral striatum, amygdala, and the salience network (Beierholm et al., 2011; Brown et al., 2018; Cisler et al., 2019; Daw et al., 2011; Gläscher et al., 2010; Ross et al., 2018), model-based RL is primarily implemented within regions of the prefrontal cortex and frontoparietal network (FPN; Beierholm et al., 2011; Gläscher et al., 2010). Supporting the possibility that individuals with PTSD may have deficits in model-based RL, prior research has documented deficits in cognitive functions that are implemented within brain regions that overlap with those that support model-based RL, such as dorsolateral prefrontal cortex, inferior frontal gyrus, and anterior prefrontal cortex (Leskin & White, 2007; Polak et al., 2012; Stein et al., 2002; Woon et al., 2017; Alvarez & Emory, 2006; Doll et al., 2016). Thus, model-based RL may be disrupted in PTSD to a greater extent than model-free processes.

The primary goal of the present study was to build on previous work assessing model-based RL during acquisition and extinction of fear among women with assault-related PTSD. The study focused on assault-related PTSD because previous research has consistently shown that assault is a more potent risk factor for the development of PTSD than other forms of trauma (Breslau et al., 1998; Cisler et al., 2012; Frans et al., 2005; Kessler et al., 2017; Resnick et al., 1993). Additionally, because different forms of trauma predict different PTSD symptom profiles (Kelley et al., 2009), we selected participants with assault-related PTSD to increase the homogeneity of our participants, allowing us to avoid potential confounds of trauma type. Women were specifically selected for inclusion, because women are (1) twice as likely as men to develop PTSD (Kessler et al., 2005; Kilpatrick et al., 2013) and (2) at higher risk of exposure to many forms of interpersonal violence than men, including rape, sexual assault, and physical assault by an intimate partner (Iverson et al., 2013). It was hypothesized that the model-based RL model would provide a better fit for participants’ behavior than the model-free and hybrid models, which do not allow a learner to develop sets of cue-outcome associations that are differentially applied and updated based on inferences about task rules (e.g., rules that differ for the acquisition and extinction context). It was further hypothesized that FPN encoding of trial-by-trial updates of current beliefs about task conditions, which are specific to the model-based model and contribute to differential weighting of cue value expectations based on a learner’s hypotheses, would predict PTSD symptom severity. Specifically, it was anticipated that reduced FPN encoding would predict greater PTSD symptom severity, reflecting poorer contextually derived learning. Because PTSD is related to difficulties with fear extinction/safety learning (Jovanovic et al., 2009; Jovanovic et al., 2012), potential differences in the encoding of model-belief updates during acquisition versus extinction were explored. Due to high overlap between belief update and prediction error parameters during extinction, the acquisition versus extinction analyses focused on value expectations while controlling for belief updates.

Methods

Participants

A total of 103 women were enrolled as part of a larger randomized clinical trial across two study sites: (1) University of Arkansas Medical Sciences and (2) University of Wisconsin-Madison (note: n = 175 participants were assessed for eligibility, and n = 103 were enrolled; for full recruitment information, see Cisler et al., 2020). Primary inclusion criteria included female sex, aged between 21 and 50 years, and a current diagnosis of PTSD related to sexual or physical assault. Primary exclusion criteria included psychotic symptoms, pregnancy, learning disability, and medication, or magnetic resonance imaging (MRI) contraindications. Twelve women were excluded due to task visit no show, claustrophobia, or a positive drug screen, yielding 91 subjects. Of these 91 subjects, a total of 85 had either viable skin conductance responses or neuroimaging data (see Computational Modeling and Neuroimaging sections below).

Clinical Interview and Measures

The past month version of the Clinician Administered PTSD Scale for DSM-5 (CAPS-5) was used to assess for the presence of current PTSD related to interpersonal violence (i.e., assault-related PTSD; Weathers et al., 2018). The CAPS-5 is a 30-item structured clinical interview that assesses for PTSD symptoms across four clusters: reexperiencing, avoidance, negative cognitions and mood, and hyperarousal. Symptoms are rated on a scale from 0 (absent) to 4 (extreme/incapacitating). To meet criteria for current PTSD, individuals must endorse a score of 2 or above for at least one reexperiencing symptom, one avoidance symptom, two negative cognitions/mood symptoms, and two hyperarousal symptoms. In addition to a providing a categorical diagnosis, total symptom scores provided a dimensional measure of current PTSD symptoms. Although all participants had a diagnosis of PTSD, there was substantial variation in the CAPS-5 symptom severity scores (see distribution of scores in Supplemental Figure S1). The One-Word Receptive Picture Vocabulary Test, Fourth Edition (ROWPVT-4), was used as a proxy measure of intelligence quotient (IQ; Martin & Brownell, 2011). IQ was estimated to account for potential effects of individual differences in IQ on model-based RL, given that IQ deficits have previously been found to relate to poorer model-based RL (Culbreth et al., 2016). During the ROWPVT-4, participants match vocabulary words that are administered verbally to illustrations that are presented in a book (Brownell, 2000). Scores were normed according to chronological age. Additional clinical and trauma assessments were completed by participants but were not of primary interest. Follow-up tests were implemented to account for the potential the impact of these variables on results (for a description of the assessments and results of the follow-up tests, see the Supplement).

Fear Conditioning and Fear Extinction Task

Participants completed four task blocks that alternated between fear acquisition and fear extinction (Fig. 1a). The first acquisition block was preceded by a baseline (habituation) period of 12 trials (6 for presentations of each cue) without any administrations of the unconditioned stimulus (UCS).Footnote 2 The UCS was an electrotactile stimulation that was delivered to participants’ lower leg. Stimulation level was set to a maximum of 50 mA, and participants’ stimulation level was individually calibrated before the task at a level that was uncomfortable but not painful (approximately 7 of 10 on a Likert scale: 0 = not uncomfortable, 10 = extremely uncomfortable/painful). Triangles and circles served as the conditioned stimuli and different colored backgrounds identified the current context (i.e., the acquisition or extinction block), which were counterbalanced across participants (i.e., for half of participants, the CS+ was a triangle and for the other half the CS− was a triangle).

Fig. 1
figure 1

a Schematic representation of the acquisition and extinction blocks of the Fear Conditioning and Fear Extinction Task. b Representation showing the temporal mapping of value estimations (V), prediction errors (PE), and latent state belief updates (dB) during the no shock and shock trials. For the neuroimaging analyses, the onset phase was parametrically modulated by trial-by-trial value expectations (Vt,c) and the outcome phase was parametrically modulated by trial-by-trial PEs (positive and negative; PEt,c) and latent-state belief updates (dBt,c)

During each fear acquisition block, 18 conditioned safety cues (CS−) and 18 conditioned danger/threat cues (CS+) were presented for 3 seconds, with an intertrial interval of 2-6 seconds. During acquisition, the presentation of the CS+ was followed by an electrotactile stimulation (UCS) on 50% of trials, which occurred 2.5 seconds after the CS+ onset for duration of 500 msec. During each fear extinction block, there were 18 trials each of the CS− and CS+ cues, which were presented for 3 seconds, with an intertrial interval of 2-6 seconds. During extinction, no electrotactile stimulations occurred following the presentation of the CS+. The CS− and CS+ stimuli were pseudorandomly presented during each block, and participants completed a total of 156 task trials.

Skin Conductance Response Acquisition and Preprocessing

Following an approach used in previous studies (Homan et al., 2019; Li et al., 2011), participants’ SCR data was used to test the fit of several RL models. SCR has previously been shown to map onto value expectations and associability during RL (Li et al., 2011). More specifically, anticipatory SCR scales with the degree to which individuals expect that an outcome will occur for a given cue (e.g., delivery of an electrotactile stimulation), with larger SCR responses reflecting greater expectation of an outcome. Model fit was tested by minimizing the error between model estimated trial-wise value expectations and participants’ trial-wise SCR.

SCR data were acquired from participants’ left hand with the BIOPAC MP150 Data Acquisition System using the EDA100C module with the MECMRI-TRANS (MRI compatible) cable system. BIOPAC AcqKnowledge 4.3 software recorded SCR data at a rate of 2,000 Hz at the Arkansas site and 1,000 Hz at the Wisconsin site. Data were preprocessed using an approach that is consistent with our prior studies (Cisler et al., 2020; Privratsky et al., 2020) and contemporary recommendations on modeling skin conductance data (Bach, 2014; Bach et al., 2010; Bach et al., 2013; Bach & Friston, 2013). This pipeline used a 10-ms median filter, unidirectional butterworth filter with 0.0159 hz and 5-hz low- and high-pass frequencies, and by downsampling to 10 hz. Next, trial-by-trial SCR responses were estimated using a forward convolution model of SCR and were normalized to individuals’ maximum SCR response. Seventeen participants were excluded from computational modeling analyses due to flat responding, an excessive number of artifacts, or missing SCR data, yielding 74 participants whose data were included in the computational modeling. The amount of SCR data loss (19%) is comparable to prior fear extinction studies using SCR (Garfinkel et al., 2014; Haaker et al., 2013; Raij et al., 2018).

Computational Models

To identify whether a model-free, hybrid, or model-based RL model (from here on referred to simply as model-free, hybrid, and model-based models) provided a better fit to participants’ SCR data, several models were tested. The primary set of models that were tested against the model-based model included a standard RW model and a hybrid model, both of which have previously been used to estimate learning parameters from SCR data during fear conditioning tasks and the latter of which has been found to provide a better fit than the standard RW model (Li et al., 2011). Additional versions of the standard RW model were tested for completeness (see the Supplement).

Model-Free Model

Model-free RL was assessed with several versions of the Rescorla-Wagner (RW) model. The standard RW assumes that a learning agent keeps track of associative strengths representing the learner’s expectations for an outcome following the presentation of a cue. For a learning agent, we let a continuous variable Vt,c denote the associative strength (i.e., value expectation) on trial t for observed cue c. We also let a binary variable outcomet denote the outcome on trial t, with outcomet equal to 1 if the participant received a shock and 0 otherwise. Associative strengths are updated via prediction errors (PE), given by the difference between what happened (outcome) and what was expected, scaled by the learning rate. If cue c is presented on trial t, then the PE is denoted by δt = outcomet – Vt,c, and the associative strength of cue c is updated as follows: Vt+1,c= Vt,c + αt δt, where αt is a constant on trial t known as a learning rate. For the standard RW model, the same learning rate is used for all trials and associative strengths are only updated for presented cues, i.e., αt = α for all t and Vt+1,c= Vt,c for cue c not presented on trial t.

Hybrid Model

The hybrid model builds on the RW model: Vt+1,c=Vt,c + καt × δt, where Vt,c is the value expectation of cue (c) on the current trial (t), κ is a learning rate, αt is the associability for the current trial, and δt is the PE for the current trial (Le Pelley, 2004; Li et al., 2011). As with the RW model, δt = outcomet – Vt,c. Unlike the RW model, a constant learning rate (κ) scales an associability parameter that changes from trial to trial and is defined as αt+1 = η|δt| + (1-η)αt. The free parameter, η, scales the prior magnitude of PE and its additive inverse scales the prior associability (i.e., weighted PE). A higher η reflects greater weighting of the prior PE relative to the prior associability.

Model-Based Model

The latent state (LS) model was used to capture model-based RL (Cochran & Cisler, 2019; Letkiewicz et al., 2020). This model was previously found to explain learning phenomena characterized by contextually based learning better than other widely used models (e.g., renewal, spontaneous recovery). Latent states are unobserved task rules/conditions that, in aggregate, define a learning environment. Each latent state contains sets of associations between cues and outcomes and a learner must infer which associations are currently most applicable. The LS model builds on the RW model by using integer l (i.e., the latent state) to index these sets of associations, where Vt,c,l is the current value strength of option c for latent state l: Vt+1,c,l= Vt,c,l + αt,c,l × δt,l. For the LS model, the learning rate (αt,c,l) is specific to the cue (c) on the current trial (t) for latent state l. The learning rate is proportional to a quantity that captures the degree to which a learner believes that the current task conditions are captured by a given latent state, referred to as latent-state beliefs, pl,t. The PE is specific to the latent state, whereby δt,l = outcomet – Vt,c,l for cue c on trial t for latent state l. Trial-by-trial expectations Vt,c of cue c are computed by taking a weighted average of Vt,c,l with weights pl,t. Following an outcome on a given trial, beliefs about current task conditions are updated (delta beliefs, dB). Larger updates in latent-state beliefs reflect larger changes in a learner’s internal model of the current task rules (see Supplement for additional details).

Analyses

Computational Modeling

Skin conductance responses that were acquired during the Fear Conditioning and Extinction Task were used to identify optimal participant model parameter estimates. For each participant, model parameters were estimated by fitting computational models to SCR data from participants without any missing data (n = 74) via maximum likelihood estimation. Following convention, a square root transformation was applied the SCR (prior to the transformation, SCR values were rescaled between 0 and 1). Normalized, square root transformed SCR values were regressed onto trial-by-trial linear value estimation terms (Vt,c). Skin conductance responses were also regressed onto associability (αt) for the hybrid model and onto updates in latent state beliefs (dBt) for the LS model. Regression error was assumed to follow a normal distribution with mean zero and unknown variance. Maximum likelihood estimation was performed by minimizing squared regression error summed only over trials in which a shock was omitted (Li et al., 2011) using fmincon in Matlab (The MathWorks, Inc., Natick, MA). For each participant and RL model, estimation yielded regression coefficients and RL model parameters. Resulting log-likelihood values were compared to identify the best-fitting model. Additional regression parameters were included in exploratory analyses to identify whether the inclusion of non-linear terms would capture large trial-by-trial changes in SCR not readily captured by linear terms, thereby yielding better model fit (results provided in the Supplement).

Neuroimaging

All neuroimaging analyses focused on the LS model parameters (see Supplement for neuroimaging acquisition and preprocessing details). Trial-by-trial dBs, which characterize model-based RL updates, were carried forward to voxelwise and independent component analysis (ICA) to identify brain regions and networks that support implementation/encoding of latent-state updates among women with PTSD. Additionally, analyses focused on identifying whether PTSD symptom severity modulated dB-related encoding. Because individual-level parameters have previously been shown to be too noisy to yield reliable neuroimaging results (i.e., they exhibit high levels of error variance), learning parameters (e.g., V, PE, dB) were averaged across participants and the resulting trial-by-trial mean parameters were used in the neuroimaging analyses in accordance with previous research (Daw et al., 2005; Daw et al., 2011; Li et al., 2011; Schönberg et al., 2007; Schönberg et al., 2010). Of the 91participants who were eligible for the present study, 77 participants had viable neuroimaging data. However, one participant was excluded due to missing clinical variables (final sample: n = 76; see the Supplement for information regarding the overlap between the computational modeling and neuroimaging samples). See Table 1 for demographic and clinical characteristics of participants included in the computational modeling and/or neuroimaging analyses (n = 85).

Table 1 Participant demographic and clinical characteristics

Voxelwise Analyses

Participants’ voxelwise time courses were regressed onto the design matrix using AFNI (3dREML; Cox, 1996). The design matrix included the stimulus onset and outcome phases of the task. The onset phase was parametrically modulated by trial-by-trial value expectation (Vt,c) from the LS model, and the outcome phase was parametrically modulated by trial-by-trial PE (positive and negative; PEt,c) and latent-state belief updates (dBt,c) from the LS model (Fig. 1b). To account for the potential impact of the electrotactile stimulation on neural activity, the design matrix also included a “shock” regressor. Because PE and the occurrence of the shock are highly correlated (Erdeniz et al., 2013), PE-related neural activity was not interpreted. Beta coefficients for the onset phase modulated by V and the outcome phase modulated by dB were included in the second-level analyses. Linear mixed effects models (LMEs) were implemented using MATLAB (fitlme; The MathWorks, Inc., Natick, MA) to test for main effects of neural activity during dB and V, controlling for age, IQ, and study site:

$$ \mathsf{dB}\ \mathsf{\beta}\sim \mathsf{age}+\mathsf{IQ}+\mathsf{study}\ \mathsf{site}+\left(\mathsf{1}|\mathsf{subject}\right) $$
(1)
$$ \mathsf{V}\ \mathsf{\beta}\sim \mathsf{age}+\mathsf{IQ}+\mathsf{study}\ \mathsf{site}+\left(\mathsf{1}|\mathsf{subject}\right) $$
(2)

Additionally, a model tested for potential unique effects of dB and V-related neural activity on CAPS-5 symptom severity. PE was included as a predictor, but PE results were not interpreted (for the reasons stated above):

$$ \mathsf{CAPS}\sim \mathsf{V}\ \mathsf{\beta}+\mathsf{dB}\ \mathsf{\beta}+\mathsf{PE}\ \mathsf{\beta}+\mathsf{age}+\mathsf{IQ}+\mathsf{study}\ \mathsf{site}+\left(\mathsf{1}|\mathsf{subject}\right) $$
(3)

A set of follow-up analyses assessed for differential effects of acquisition versus extinction on the results of models 2 and 3. The design matrix included separate regressors for trial onset modulated by value expectation during acquisition (Vta,c) and extinction (Vte,c), and separate regressors for outcome modulated by dB during acquisition (dBta,c) and extinction (dBte,c). Because correlations between PE during extinction (PEte,c) and Vte,c and between PEte,c and dBte,c were highly correlated (|r| > 0.70), PE was not included in these analyses. Additionally, given that outcome-related neural activity associated with PE could not be separated from that of dB, LME tests focused on whether neural activity during V significantly differed during acquisition versus extinction (contrast coded: acquisition = 1, extinction = −1), controlling for age, IQ, and study site:

$$ \mathsf{V}\ \mathsf{\beta}\sim \mathsf{contrast}+\mathsf{age}+\mathsf{IQ}+\mathsf{study}\ \mathsf{site}+\left(\mathsf{1}|\mathsf{subject}\right) $$
(4)

Separate models tested for potential unique effects of V and dB-related neural activity on CAPS symptom severity during acquisition and extinction.

$$ \mathsf{CAPS}\sim \mathsf{V}\ {\mathsf{\beta}}_{\mathsf{acquisition}}+\mathsf{dB}\ {\mathsf{\beta}}_{\mathsf{acquisition}}+\mathsf{age}+\mathsf{IQ}+\mathsf{study}\ \mathsf{site}+\left(\mathsf{1}|\mathsf{subject}\right) $$
(5)
$$ \mathsf{CAPS}\sim \mathsf{V}\ {\mathsf{\beta}}_{\mathsf{extinction}}+\mathsf{dB}\ {\mathsf{\beta}}_{\mathsf{extinction}}+\mathsf{age}+\mathsf{IQ}+\mathsf{study}\ \mathsf{site}+\left(\mathsf{1}|\mathsf{subject}\right) $$
(6)

Voxelwise comparisons were implemented within a sample specific grey matter mask and cluster-level thresholding controlled for voxelwise comparisons using an uncorrected p < 0.001 and cluster size k ≥ 18, which was identified using AFNIs 3dClustSim.

Independent Component Analysis

ICA was used to identify temporally coactivated spatially distributed neural large-scale networks and was implemented using the Group ICA of fMRI Toolbox (GIFT; Calhoun et al., 2001) in Matlab R2016a. A model order of 35 was selected to balance the tradeoff between component estimation reliability and interpretability. Thirteen of the 35 components were identified as functional networks theoretically related to learning or PTSD, including a left and right FPN that were of primary interest (22 networks that represented either motion artifact, CSF, or networks of non-interest such as motor cortex were excluded). Additionally, follow-up analyses were implemented with the remaining 11 networks (see Supplemental Figure S2). ICA timecourses were regressed onto the same design matrices described above using AFNI (3dREML; Cox, 1996) and resulting beta coefficients were included in the second-level analyses. The same series of LMEs described above were implemented for 1) each FPN and 2) follow-up networks using Matlab (fitlme; The MathWorks, Inc.). Bonferroni correction was applied for the two FPN networks (p < 0.025) and for the post-hoc analyses across the eleven additional networks (p < 0.005). Additionally, following an approach used by Erdeniz et al. (2013), several GLMs were fitted to participants’ ICA time courses to identify whether the inclusion or removal of the PE and/or shock regressors altered the main effects of dB and/or V (see Supplement).

Results

Model Fit

The standard RW and hybrid models, which are nested models, were formally tested using a log-likelihood ratio test. Similar to previous studies (Boll et al., 2013; Homan et al., 2019; Li et al., 2011), the hybrid model outperformed the standard RW model, χ2 = 515.43, df = 148, p < 0.001. It also outperformed the additional RW models that were tested (see Supplement). Because the LS model does not contain the terms included in the hybrid model, a log-likelihood ratio test was not performed to compare these models. A comparison between Akaike Information Criterion values, which were summed across participants, revealed that the LS model provided a better fit than the standard RW and hybrid models (Fig. 2), as well as the additional RW models that were tested (Supplemental Figure S2).

Fig. 2
figure 2

Summed Akaike Information Criterion (AIC) values across participants showing that the Latent State model outperformed the Rescorla Wagner and Hybrid models (note: lower AIC values reflect better model fit)

Main Effects

Voxelwise Analyses

Table 2 lists the brain regions in which neural activation predicted latent-state belief updates. Activity in the left inferior frontal gyrus, left and right insula, right paracentral lobule, left and right calcarine gyrus, left postcentral/precentral gyrus, cerebellum, and right cuneus were positively related to latent-state belief updates (Fig. 3a). Activity in left and right inferior occipital gyrus, superior orbital gyrus, left superior frontal gyrus, precuneus, left middle frontal gyrus, and left angular gyrus were negatively related to latent-state belief updates. Table 3 lists the brain regions in which neural activation predicted value expectation. Greater value expectation-related activity emerged in the paracentral lobule, right middle frontal gyrus, and left thalamus (Fig. 3b). Lower value expectation-related activity emerged in the left precuneus.

Table 2 Regions associated with trial-by-trial changes in latent-state beliefs (dB) ouring outcome
Fig. 3
figure 3

Brain regions associated with (a) latent-state belief updates (dB) and (b) value estimations (V). Parameter-related neural activity that uniquely predicted PTSD symptom severity for (c) latent-state updates and (d) value estimations. L = left. Warm colors = positive z-values

Table 3 Regions associated with trial-by-trial associative strength (V) during stimulus onset

Table 4 lists the brain regions in which the parameter-related neural activity uniquely predicted PTSD symptom severity. Figure 3c shows that lower latent state update-related activation within the right calcarine gyrus/right posterior cingulate cortex predicted higher CAPS scores. As shown in Fig. 3d, greater value expectation-related activation within the left angular gyrus/inferior parietal lobule predicted higher CAPS scores.

Table 4 Parameter-related activity uniquely predictive of clinician-administered PTSD scale severity

ICA

Table 5 lists the ICA networks that predicted trial-by-trial latent-state belief update encoding. Reduced encoding was evident in the left frontoparietal network (FPN), as well as the limbic, dorsomedial prefrontal cortex/posterior cingulate cortex, and hippocampal networks (Fig. 4a). Increased encoding was evident in the pre-supplementary motor area and striatal networks. Table 5 lists the ICA networks that predicted trial-by-trial value estimation encoding. Increased encoding emerged in the pre-supplementary motor area and striatal networks (Fig. 4b). Results for all networks are provided in the Supplement. A similar pattern of results emerged across study sites (Figure S5).

Table 5 Networks associated with trial-by-trial encoding of latent-state updates (dB) and value estimation (V)
Fig. 4
figure 4

Depiction of the main effects of encoding during trial-by-trial changes in (a) latent-state beliefs (dB) and (b) value estimation (V) for GLM model 1 (full model: dB, PE, V, and shock regressor included)

Table 6 lists the networks in which parameter-related encoding uniquely predicted PTSD symptom severity on the CAPS. Increased encoding of value expectation within the left FPN predicted greater PTSD symptom severity, t(68) = 3.58, p < 0.001. No other results held above correction.

Table 6 Parameter-related activity uniquely predictive of clinician-administered PTSD scale severity

Task Phase Effects (Acquisition vs. Extinction)

Voxelwise Analyses

Table 7 lists the brain regions in which neural activation differentially predicted value expectation during the acquisition versus extinction task phases. Greater activity during extinction than acquisition emerged in the middle cingulate cortex, thalamus, left fusiform gyrus, right lingual gyrus, left calcarine gyrus, left middle occipital gyrus, cuneus, and right lingual gyrus.

Table 7 Regions associated with trial-by-trial value expectation (V) during stimulus onset, acquisition versus extinction

ICA

Table 8 lists the ICA networks that exhibited differential value expectation encoding during the acquisition versus extinction task phases. A significant effect of task phase emerged for value expectation encoding within both FPNs. Specifically, greater value expectation encoding was evident in the left and right FPN during acquisition relative to extinction, t(146) = 2.53, p = 0.013, and t(146) = 2.62, p < 0.010, respectively (Fig. 5). Significantly greater value expectation encoding was also evident in medial/lateral prefrontal cortex network during acquisition versus extinction, t(146) = 3.38, p < 0.001, whereas lower value estimation encoding was evident in hippocampal network during acquisition versus extinction, t(146) = −2.92, p < 0.004. Results followed a similar pattern across study site (Figure S6).

Table 8 Networks associated with trial-by-trial encoding of value estimation (V) during acquisition versus extinction
Fig. 5
figure 5

Depiction of the effects of acquisition versus extinction on encoding during trial-by-trial changes in value estimation (V)

Table 9 lists the networks in which value expectation-related encoding uniquely predicted PTSD symptom severity on the CAPS during acquisition and extinction. As shown in Fig. 6a, during acquisition, but not extinction, greater value expectation encoding in the left FPN uniquely predicted higher PTSD symptom severity, t(69) = 2.99, p = 0.004. As shown in Fig. 6b, during extinction, but not acquisition, lower value expectation encoding in the dorsomedial prefrontal cortex/posterior cingulate cortex network uniquely predicted higher PTSD symptom severity, t(69) = −4.05, p < 0.001. A similar pattern of results was evident across study site (Figure S7).

Table 9 V-related activity uniquely predictive of clinician-administered PTSD scale severity
Fig. 6
figure 6

a Scatterplots depicting relationships between left frontoparietal network encoding and PTSD symptom severity during (1) acquisition and (2) extinction, and a graphical depiction of the left frontoparietal ICA network. b Scatterplots depicting relationships between dorsomedial prefrontal cortex network encoding and PTSD symptom severity during (1) acquisition and (2) extinction, and a graphical depiction of the dorsomedial prefrontal cortex network ICA network. The dark blue scatterplots represent the relationships that reached the level of significance. The light blue (non-significant) scatterplots are provided for reference

Discussion

Results in this sample of adult women with IPV-related PTSD revealed that the latent-state model provided a better fit for participants’ physiological responses during the fear conditioning task than the standard RW and hybrid models. Notably, the latter model is considered a “gold standard” for modeling fear conditioning responses, because it has been found to fit fear conditioning responsivity better than standard RW models in several previous studies (Homan et al., 2019; Li et al., 2011), which was replicated in the current study. However, it was outperformed by the latent-state model in the present study. This suggests that the latent-state model captures learning dynamics during fear conditioning and extinction that are not readily measured by the hybrid model, such as learning that varies as a function of task conditions.

Contrary to predictions, latent-state belief updates were associated with reduced (as opposed to increased) activity in regions previously associated with model-based RL and cognitive control, including the left FPN and left superior frontal gyrus. Similar to previous research on model-based RL, latent-state belief updates were related to increased striatal network activity (Daw et al., 2011; McDannald et al., 2011; but also see Gläscher et al., 2010). Latent-state belief updates were also related to increased activity of the left and right insula, which are implicated in punishment and loss-related learning (Palminteri et al., 2012). Although the bilateral anterior insula are components of the salience network, which is often associated with model-free PE encoding (Cisler et al., 2019; Preuschoff et al., 2008), this is not the first study to identify insula activation during model-based RL (Lee et al., 2014). Heightened insula activity during updates to latent-state beliefs may reflect sensitivity to unexpected and changing environmental demands that signal increased need for cognitive control processes during learning (Jiang et al., 2015).

Although the latent-state model provided a better overall fit to the SCR data than the other computational models that do not capture model-based RL processes, PTSD symptom severity was not predicted by FPN-related encoding of latent-state belief updates. Instead, higher PTSD symptom severity was uniquely related to reduced activity during latent-state updates in the right calcarine gyrus/posterior cingulate cortex, which are implicated in visual processing, visual imagery, and the focus of attention (Klein et al., 2000; Leech & Sharp, 2014). By contrast, increased value estimation-related encoding in the left FPN and activity of the angular gyrus/intraparietal lobule uniquely predicted greater PTSD symptom severity. It was further revealed that increased encoding within the left FPN during acquisition predicted greater PTSD symptom severity during acquisition, whereas reduced encoding within the dorsomedial prefrontal cortex (dmPFC/PCC) predicted greater PTSD symptom severity during extinction. Atypical FPN and dmPFC/PCC activity have been identified in previous studies of PTSD, including fear conditioning and extinction studies (for a review, see Suarez-Jimenez et al., 2020), although the direction of these effects is somewhat mixed in PTSD. It is important to note that, in contrast with prior studies that did not separate RL and non-RL sources of variance, we were able to identify effects for distinct RL processes using our model-based model. Heightened encoding of value estimation during the acquisition blocks (i.e., when threat expectancies are high) among cognitive control regions may contribute to enhanced representation of fear, while reduced encoding of value estimation during the extinction blocks (i.e., when threat expectancies are low) of the dmPFC/PCC may contribute to difficulties revising stimulus-outcome associations within a safe environment (Wang et al., 2014).

A notable limitation of the present study is the lack of a control comparison group. Although altered neural activity during latent-state belief updates did not emerge within cognitive control regions or networks in relation to PTSD symptoms, we did not test whether individuals with PTSD exhibit poorer encoding of latent-state belief updates within FPNs relative to individuals without PTSD. It is possible that PTSD symptoms scaled with value expectations, rather than latent-state belief updates, within the left FPN because overall individuals with PTSD generally had difficulty engaging contextual learning processes during fear conditioning. It will be important to establish the typical pattern of neural activity and encoding of value expectations and latent-state belief updates to further contextualize the meaning of present results. It also will be important to establish whether results extend to non-IPV related PTSD.

Overall, results provide some evidence that model-based RL processes that are altered during fear conditioning are related to PTSD symptoms among women with assault-related PTSD. While research has primarily been devoted to examining relatively simplistic learning processes, model-free and hybrid models cannot capture higher-level, context-related learning (e.g., learning when a stimulus is dangerous vs. safe), the latter of which can be captured by the model-based model and may be particularly important in the acquisition and revision of human fear. Although disruptions in model-based processes that scaled with PTSD symptoms occurred within brain regions or networks that were not anticipated (e.g., reduced encoding within visual processing regions during latent-state belief updates), our results provide preliminary evidence of model-based RL-related impairments in PTSD that are separate from other learning processes (e.g., value estimation). Our latent-state model also identified a pattern of value estimation encoding that is distinct from latent-state update encoding that may disrupt normative fear and safety learning among individuals with assault-related PTSD. Critically, exposure-based therapy, which is a “gold standard” treatment for PTSD, depends on learning processes to extinguish fear (Hermans et al., 2005), and RL deficits identified in this study may affect treatment response. Given that even the best available treatments for PTSD have limited efficacy, with remission occurring for approximately half of individuals who receive treatment (Morina et al., 2014; Resick et al., 2002; Schnurr et al., 2007), it is proposed that future research examine the role of RL impairment in treatment-related outcomes among individuals with PTSD (IPV and non-IPV) using a model-based framework.