Humans have the remarkable ability to mentally travel in time (Suddendorf, 2013). This capacity for episodic simulation affords individuals the cognitive and behavioral flexibility to anticipate and evaluate potential outcomes when making decisions (Buckner & Carroll, 2007; Gilbert & Wilson, 2007; Schacter, Addis, & Buckner, 2008). This flexibility is especially relevant to deliberative decision making, which entails the mental simulation and evaluation of distinct, imagined future possibilities. Many choices that humans make are foraging decisions and involve choosing whether or not to take an available offer (i.e., the foreground option), as compared against unknown future outcomes (i.e., the background; Charnov, 1976; Stephens, 2008). Foraging decisions occur in a variety of real-world contexts; for example, humans forage for food in harsh environments, such as arctic hunters (Smith, 1991), taxi cab drivers forage for passengers in a city (Camerer, 1997), humans decide what food to purchase (Riefer, Prior, Blair, Pavey, & Love, 2017), and drug users forage for heroin in a black market (Hoffer, Bobashev, & Morris, 2009). However, it remains unknown what roles (if any) deliberative decision making plays in foraging behaviors.

Foraging decisions are often characterized by the prey selection model, in which one makes sequential accept/refuse choices (Stephens, 2008; Wikenheiser, Stephens, & Redish, 2013). Limited resources impose trial interdependence across a session, so that maximizing gains depends on comparing current offers against expected but unknown future options, and resources spent on one offer are not available for future offers. Many such experimental tasks include a time constraint, asking subjects to maximize gains within a specific time window, where the time (or another limited resource) spent on one option is then unavailable for future options.

A fixed economy can reveal a subject’s preferences by measuring their willingness to endure the cost to attain some but not all rewards. For instance, during the neuroeconomic Restaurant Row task (Steiner & Redish, 2014), rats have a limited time to cycle between four feeders and collect different flavored food pellets available after variable delays. Rats reveal their preferences (or thresholds) by being willing to wait for different delays for each flavor; in turn, good offers are those with delays below threshold, and bad offers are those above threshold. A key aspect of the Restaurant Row task is that the flavor order is held constant, whereas the delays are random—that is, rats know the location of the flavors but not the specific delays they will encounter on arrival. Thus, to accept the current cherry offer would mean that a rat would have less time available to spend at the chocolate restaurant that would come up next. Critically, the sequential task design separates out the past (the offer just left), the current time (the offer available to the rat), and the future (the next offer that will be available); neural signatures can then be tracked in a circular format, to indicate whether a rat was contemplating the current versus alternative offers. This sequential structure has led to novel discoveries regarding deliberation and regret in rats—that is, scenarios in which a rat turned down a good offer on the previous trial only to encounter an unfavorable offer on the subsequent trial.

Human neuroimaging and nonhuman neurophysiology findings inform our investigation of the neural circuits that underlie human episodic simulation during deliberation. Higher-level visual cortices may support the representation (and distinguishing) of complex visual stimuli (Haxby et al., 2011; Norman, Polyn, Detre, & Haxby, 2006), including such regions as the lateral occipital cortex and fusiform gyrus (Grill-Spector & Weiner, 2014). Sensory cortices may also play a role. For instance, Doll, Duncan, Simon, Shohamy, and Daw (2015) found that binary decisions that depend on planning activate the sensory cortical representations of those outcomes; this finding is consistent with evidence that overlapping neural systems are involved in past recall and future simulation (Hassabis & Maguire, 2007; Schacter & Addis, 2007a; Schacter et al., 2012) and when imagining and perceiving a stimulus (Pearson, Naselaris, Holmes, & Kosslyn, 2015). Additionally, human fMRI studies and neural recordings from rodents have implicated the hippocampus and parahippocampus in deliberation (Buckner & Carroll, 2007; Hassabis, Kumaran, Vann, & Maguire, 2007; Redish, 2016). The hippocampus supports the formation of internal cognitive maps and the evaluation of potential options (Kaplan, Schuck, & Doeller, 2017; Wang, Cohen, & Voss, 2015): Individuals may use cognitive maps during deliberation to extract key information from prior experiences, to guide future choices, and to more efficiently encode new experiences.

In nonhuman animals, deliberation is intimately tied to the difficulty of a choice. Rats faced with difficult choices, those just above the decision threshold, spend more time deliberating over those choices (Steiner & Redish, 2014; Sweis, Abram, et al., 2018a) and exhibit a behavioral process termed “vicarious trial and error” (VTE; Tolman, 1939). Similarly, humans making difficult choices show lengthened reaction times (Shenhav, Straccia, Cohen, & Botvinick, 2014). Because VTE is implicated during uncertainty and in difficult decisions, it is thought to capture the indecision that underlies deliberation (Redish, 2016). During VTE, hippocampal place cells show forward-sweeping representations that alternate between options, suggesting that the rat is mentally simulating possible outcomes (Johnson & Redish, 2007). These forward sweeps are evident during challenging choices requiring more deliberation and fade out as decision behaviors become more automated (Johnson & Redish, 2007; K. S. Smith & Graybiel, 2013; van der Meer, Johnson, Schmitzer-Torbert, & Redish, 2010). The hippocampus may serve an analogous role in humans, although this theory has not been tested directly.

In the present study, we employed a human version of the Restaurant Row task, called the “Web-Surf Task” (Fig. 1; Abram, Breton, Schmidt, Redish, & MacDonald, 2016). Humans had a fixed amount of time to forage for videos across four serially presented “galleries” (i.e., video categories: kittens, dance, landscapes, and bike accident videos). On the basis of previous data that animals, tools, bodies, and scenes are represented differently within cortical circuits (Haxby et al., 2011; Haxby et al., 2001; Reddy, Tsuchiya, & Serre, 2010; Tong & Pratte, 2012), we first hypothesized that we could differentiate representations of the four categories via fMRI. After confirming that humans made internally consistent foraging decisions and that the neural representations of categories were dissociable, we tested for evidence of episodic simulation during deliberation. We hypothesized that in foraging decisions, deliberation should be more concerned with the upcoming offer (i.e., the foreground option) than with the alternatives (i.e., the background) and that such deliberation would be supported in visual cortices known to represent complex visual stimuli. Additionally, we anticipated that hippocampal regions would be involved in making difficult choices (requiring more deliberation).

Fig. 1
figure 1

MRI Web-Surf Task layout and flow diagram. The flow diagram illustrates differences between a stay and a skip trial (left). If the subject stays (1), the subject waits through the delay, views the 4-s video clip (2), and rates the video (3). If the subject instead chooses to skip, the subject then proceeds through the cost phase (4) and arrives at the next offer (5). (Upper right) Schematic of the Web-Surf Task: Subjects had 35 min to cycle between the four video galleries in the depicted order. (Bottom right) Example of delay threshold computation for a single subject

Materials and method

Subjects

Twenty-nine healthy volunteers were recruited for the present study. Twenty-five of the subjects were retained for analysis (52% male; age range = 20–39 years old, mean age = 28 years, all right-handed), after excluding one subject for excessive head motion (i.e., mean movement greater than 3 mm, or 1.5 times the voxel size), one for claustrophobia, and two for invalid behavioral data. The subjects were recruited via Craigslist (an American classified advertisement website) and reported no prior history of neurological disease or severe mental illness and no first-degree relatives with a severe mental illness. Subjects completed a urine drug screen prior to participation, and only those with a negative screening continued. All subjects provided written informed consent, and the study procedures were approved by the Institutional Review Board at the University of Minnesota.

Web-Surf Task layout

Subjects had 35 min to cycle between four video categories (i.e., kittens, dance, landscapes, and bike accidents) presented using PsychoPy (Peirce, 2009). The categories were indicated by the symbol at the top of the screen (Fig. 1), and appeared in a fixed order. On arrival at a category, subjects were presented with a “download bar” (the offer) that indicated how long they would have to wait (delays ranged from 3 to 30 s) for a given reward (i.e., 4-s video clip). If they elected to stay, the delay counted down, the subject watched the video clip, and then rated it on a 1–4 scale. If the subject chose to skip, the subject proceeded to the next category and received a new offer. The period between the start of a trial (i.e., the arrival at a category) and the subject’s stay/skip choice was defined as the decision, and the period between the start of a video and the subject’s rating was defined as consumption. When traveling between categories, subjects had to click on the numbers 1–4 as they randomly appeared around the screen; this represented a travel cost. The numbers were presented in dark gray against a gray screen, to increase the difficulty. Trials were presented in 9-min blocks, with 45 s of a fixation crosshair shown between blocks. All subjects completed practice both in- and outside the scanner.

Web-Surf preview task

Before the main task, subjects completed a preview task that presented a fixed set of ten 4-s video clips from each category; the categories appeared in the same order as in the main task, and the video clips were randomly ordered within each category. Subjects rated each video using the same scale as in the main task. A fixation crosshair appeared between videos for 3–6 s (the durations were randomized). The total task time was approximately 7 min. This task provided baseline estimates of preference and neural activation for each category, in case a subject were to skip all offers from a particular category during the main task.

fMRI data acquisition and preprocessing

Neuroimaging data were collected using a 3-T Siemens MAGNETOM Prisma with a 32-channel head coil at the University of Minnesota’s Center for Magnetic Resonance Imaging. A high-resolution T1-weighted (MPRAGE) scan was collected for registration [repetition time (TR) = 2.5 ms; echo time (TE) = 3.65 ms; flip = 7°; voxel = 1 × 1 × 1 mm]. The main task data were collected using a single whole-brain echo-planar imaging (EPI) run, with the following sequence parameters: TR = 720 ms, TE = 37 ms, flip angle = 52°, voxel size = 2 × 2 × 2 mm (approximately 3,500 volumes), multiband factor = 8; the same parameters were used for the preview task EPI sequence (approximately 500 volumes), and an additional short reverse phase-encoded EPI sequence was used for distortion correction (ten volumes). The entire scanning session lasted 1 h. The order of acquisitions was as follows: three-axis localizer scan, AA Scout [aligned slices to the anterior commissure–posterior commissure (AC–PC) line], T-1 MPRAGE, reverse phase-encoded EPI sequence, preview task, and the Web-Surf Task. There was no set spacing between scans.

We carried out standard preprocessing using the FMRIB Software Library (FSL version 5.0.8; Jenkinson, Beckmann, Behrens, Woolrich, & Smith, 2012), which included brain extraction, motion correction,Footnote 1 prewhitening, high-pass temporal filtering with a sigma of 50 s, spatial smoothing with a 6-mm full-width-at-half-maximum Gaussian kernel, and spatial normalization and linear registration to the Montreal Neurological Institute (MNI) 152 standard brain. We also employed FSL’s top-up functionality to correct for susceptibility-induced distortions. This entailed collecting a reverse phase-encoded EPI sequence with distortions going in the opposite direction, which was paired with both the main task and the preview task. The susceptibility-induced off-resonance field was estimated from these pairs using a method similar to that described in Andersson, Skare, and Ashburner (2003). The images were then combined into a single corrected one.

Value computations

Value was computed as the category-specific threshold minus delay, where thresholds indicated the delay time at which a subject reliably began to skip offers for a particular category. Delay thresholds were computed separately for each trial, per category, using a leave-one-out approach: To obtain the threshold for trial i, we fit a Heaviside step function to all trials in category c excluding trial i. This produced a vector of thresholds with length equal to the number of trials in category c, and we used the average of that vector in subsequent analyses. We used a Heaviside step function as an alternative to the logistic fit function described in Abram et al. (2016) because the Heaviside step function can better handle extreme cases (i.e., when a subject stays or skips all offers in a category). In such instances, the Heaviside step function produces a reasonable value (e.g., 3 or 30) reflecting the range of possible delays, whereas the logistic fit function is likely to produce a value approaching infinity. Values then ranged from approximately – 27 to 27 (e.g., a 30-s delay with a threshold of 3 s vs. a 3-s delay with a threshold of 30 s), and a value of 0 meant that the offer was equal to the threshold.

Behavioral validity analyses

We first asked whether humans made internally consistent foraging decisions, by correlating the subject-derived delay thresholds with measures of reward liking (i.e., average category ratings, post-test category rankings). These methods were the same as those previously described (Abram et al., 2016).

Learning analyses

We next investigated cross-session learning effects, given that subjects skipped more as the session progressed, potentially due to satiation.Footnote 2 We were specifically interested in whether we could detect behavioral changes as subjects became more familiar with the task (i.e., a switch from more to less deliberative decision processes). To this end, we compared decision reaction times against trial number and then conducted follow-up analyses to understand how learning impacted the relations between reaction times and value. We performed the same analyses using video-rating reaction times, to assess whether the cross-session shifts were similar for the decision and consumption phases. We also examined the extent to which the video ratings fluctuated across the session, with consideration of subject-specific preferences.

Preview task general linear model analyses

Functional data from the preview task were first analyzed at the group level, using a whole-brain voxel-wise general linear model (GLM) approach to assess for category-relevant activation. Here we used the fMRI Expert Analysis Tool (FEAT) within FSL. We modeled the video viewing and rating for each category as separate events (yielding four regressors of interest), along with the six standard motion parameters as confound regressors. The events were convolved using a double-gamma hemodynamic response function (HRF) and a threshold of z > 3.1, and a whole-brain, corrected cluster-extent threshold of p < .01 was applied.

Decoding validity analyses

We used a multivoxel pattern analysis decoding method, which offers a unique approach for probing episodic memory in humans and is useful for identifying category-specific representations. In particular, we employed the sparse multinomial logistic regression (SMLR; Krishnapuram, Carin, Figueiredo, & Hartemink, 2005) classifier, available in the PyMVPA machine-learning package (for multivariate pattern analysis in Python, http://www.pymvpa.org; Hanke et al., 2009). We selected this classifier because of its computational efficiency and good classification performance (Krishnapuram et al., 2005; Sun et al., 2009). The SMLR classifier utilizes multiple regression to predict the logarithm of the odds ratio of belonging to a particular class. This ratio is then transformed into a probability via a nonlinear transfer function that ensures that all classification probabilities sum to 1. The sparsification component promotes a more parsimonious and generalizable solution. For the present analyses, we used the default lambda penalty setting (λ = 1).

Decoding was conducted on a subject-by-subject basis and included the previously preprocessed fMRI data. We trained the classifier on the preview task data for all decoding analyses, because (1) each subject saw the same set of videos during the preview task,Footnote 3 (2) the preview task contained trials from every category (whereas subjects could elect to skip all videos from a category during the main task), and (3) this provided a separate training set, so we did not have to create a cross-validation set from the main-task data.

In Step 1, we determined whether stimuli from the four categories were distinguishable via SMLR decoding using only the preview task data, as the subsequent analyses hinged on successful category separation. The first step in this process entailed fitting a GLM to the preview task data, to obtain linear model activity estimate images (i.e., parameter estimates), which were then supplied as examples to the classifier. Each video category was modeled as a separate event, and we also included a regressor to account for the fixation periods between the videos; this event was considered the other category and provided a control from which to compare the four video categories. For this analysis, samples were “chunked” to create groups of samples for cross-validation, each of which included two video samples from each category, as well as the fixation periods between those samples; the scan duration of a chunk ranged from approximately 60 to 80 s, depending on the fixation lengths between videos. All trials in a given loop (or complete pass through all four categories) were included in the same chunk, as well as four trials from a different loop. We averaged two samples per category when forming chunks, as this approach produces less noisy examples (Pereira, Mitchell, & Botvinick, 2009). After fitting the model, we z-scored the data separately for each chunk.Footnote 4 We performed 60/40 cross-validation—i.e., we left two chunks out—on the preview task parameter estimate maps.Footnote 5

In Step 2, we tested whether we could predict which video a subject had viewed during the main task after training the classifier on the preview task data. We again fit a GLM to acquire a parameter estimate map for the video consumption time; we scaled the resulting parameter estimates to the training data, as the training data included an equal number of data points per condition. We then predicted which category the consumption time best represented (i.e., kittens, dance, landscape, bike accidents, or non-video) and extracted the corresponding probability estimates (one per category). The final step entailed combining the subject-specific data and reorganizing the probabilities according to the subjects’ locations within the loop of video categories. Given that the categories were presented in a fixed cycle (e.g., kittens, dance, landscapes, bike accidents, and then back to kittens), we could organize the decoding results by using previous, current, next, opposite, or non-video labels to indicate a subject’s location within that cycle, and track the past, current, and future representations as the subject traversed the task. As an example, for a subject at the kitten category, the obtained probabilities were labeled as follows: bike accidents (previous), kittens (current), dance (next), and landscapes (opposite). For a subject at the dance category, the obtained probabilities were labeled as follows: kittens (previous), dance (current), landscapes (next), bike accidents (opposite), or non-video (when the data did not correspond to any of the video category neural signatures).

We computed mixed-effect logistic models to compare probabilities between the locations using the lme4 package in R (Bates, Maechler, Bolker, & Walker, 2014). Specifically, we regressed the SMLR probabilities on location, with subject as a random effect. In the main text we report F statistics to indicate whether there were overall probability differences between the locations. For significant overall models, we used two-tailed chi-square tests to determine which locations had probabilities above chance—that is, 1/5 = .20 (four video categories and one control category). Finally, for locations with probabilities above chance, we used follow-up one-tailed chi-square tests to determine whether that location’s probabilities were greater than those of the additional locations (e.g., for a model with only the current category attaining probabilities above chance, the contrasts would be current > previous, current > next, current > opposite, and current > non-video we report the original p values as well as false discovery rate (FDR)-adjusted p values, using Benjamini and Hochberg’s FDR control algorithm (Benjamini & Hochberg, 1995) for follow-up between-location comparisons.

Decoding deliberation analyses

Our primary aim was to test for evidence of deliberation during decision making. We employed an approach comparable to that described under Step 2, but instead we fit GLMs to the decision phase of the main-task data when acquiring the activation maps for the classifier (again scaling these maps to the training data).

We used mixed-effect logistic models and chi-square tests (as described above) to first identify which location(s) (i.e., previous, current, next, opposite, or non-video) were represented best across all the trials, and then by different decision conditions (e.g., stay vs. skip choices). In follow-up analyses, we capitalized on the sequential task structure by evaluating the relations between serial actions and deliberation, on the basis of foundational work by Steiner and Redish (2014) using the rodent Restaurant Row task. To this end, we used mixed-effect logistic models to compare decision-decoding probabilities and decision reaction times across four conditions: skip previous + skip current, stay previous + skip current, skip previous + stay current, and stay previous + stay current; importantly, we considered the value (i.e., threshold – delay) of the past offer, since the decision to reject good versus bad offers should yield differential effects (Steiner & Redish, 2014; Sweis, Redish, & Thomas, 2018b; Sweis, Thomas, & Redish, 2018c).

Choice difficulty general linear model analyses

Finally, we compared the decoding results above with those from a traditional GLM approach that pinpointed activation related to difficult choices. This model included four regressors (choice, delay, video viewing/rating, and travel), plus motion parameters. We weighted each decision and video-viewing event by its distance from the respective category threshold, such that events closer to threshold were weighted more heavily. Decisions in this model were isolated to the last second of the choice phase (given that reaction times differed systematically by value). Events were convolved using a double-gamma HRF and evaluated with a threshold of z > 3.1 and a corrected cluster-extent threshold of p < .01.

Results

Choices predict reward likability

Initial behavioral analyses revealed that people typically made choices consistent with offer value—that is, threshold minus delay, where thresholds represented the point at which a subject reliably began to skip offers from a particular video category (see Fig. S1 for plots of each subject’s thresholds). Foraging decisions conformed to a sigmoid pattern, in which subjects typically accepted offers above threshold (offers valued greater than 0) and declined those below threshold (Fig. S2A). This suggests that our threshold metric was a good indicator of value-based decisions. To test the correspondence between subjects’ decisions and their liking of rewards, we correlated the four category thresholds with average category ratings and post-test category rankings separately. We found that 76% of the average rating correlations were above .5, and 88% of the post-test ranking correlations were above .5 (Fig. S2B); these values were comparable to those previously reported (Abram et al., 2016). We also noted that, on average, subjects rated all categories between 2 and 3 (Fig. S3), indicating that subjects generally found the video stimuli rewarding.

Cross-session characteristics

We observed a strong downward trend in decision reaction times as the session progressed (β = – .002, p < .001; Fig. 2A). In the first half of the session, reaction times were consistently slow for low-valued trials, as compared to a more peaked formation around threshold in the second half. Furthermore, reaction times were lower overall in the second half (paired t = 11.92, p < .001). These patterns could reflect the process of adjusting one’s threshold, with subjects showing a much clearer understanding of their thresholds later in the session.

Fig. 2
figure 2

Cross-session behavioral shifts. (A) Decision reaction times show the steepest decline in the first 50 trials (left); reaction times were consistently slow for low-valued trials in the first half of the session (middle) versus a sharp peak at threshold for the second half (right). (B) Video rating reaction times did not decline significantly across the session (left), and video rating reaction times were less driven by value than decision reaction times (middle and right). Thresholds are indicated by the vertical lines at 0, and shaded bands represent 95% confidence intervals

We also point out that the increased reaction times for low-valued trials near threshold are analogous to the VTE pattern observed in rats and mice during the Restaurant Row task (Steiner & Redish, 2014; Sweis, Abram, et al., 2018a): Subjects took longer to make choices for offers that approached threshold, and were fastest for those offers significantly above or below threshold (Fig. 2A). Consistent with the rodent data, reaction times remained high for lower-value offers (above threshold) more than for higher-value offers (below threshold) during the first half of the session; this suggests that offers just above threshold (i.e., small negative values) were especially challenging and required more thoughtfulness.

In comparison to decision reaction times, rating reaction times did not decline significantly as a function of trial number (β = .000, p = .15; Fig. 2B). Rating reaction times were also less impacted by offer value, suggesting that learning is more relevant to decision making than to post-consumption evaluation.

Subjects also showed a slight decline in ratings across the session (β = – .002, p < .001; Fig. S4), with relatively similar declines across the four categories; however, when considering rating shifts by preference, we see that the two top preferred categories showed a sharper drop off in the second half of the session than did the lesser preferred categories.

Distinguishability of video categories

To test for deliberation, it was critical that we could discriminate the four video categories on the basis of their neural signatures. An initial evaluation of the preview task data showed similar activation patterns across the video categories (Fig. S5; Table S1; regions included the anterior insula, pre-/post-central cortex, hippocampus/parahippocampal gyrus, anterior cingulate cortex (ACC), lateral occipital cortex, lingual gyrus, and inferior frontal gyrus). Given the large overlap in activation across the categories during the preview task, we created a cumulative mask for decoding that entailed merging the four main effect maps—i.e., the preview task mask (Fig. 3). However, because the preview task mask extended beyond the visual cortex (into more anterior substrates), we compared its decoding performance to that from a second mask—i.e., the preview task visual mask; we restricted this mask to higher-level visual regions known to represent complex visual stimuli (Haxby et al., 2011).

Fig. 3
figure 3

Decoding masks: Illustrations of the two masks used for the decoding analyses. The preview task visual mask is a restriction of the preview task mask that includes only visual areas. ACC = anterior cingulate cortex; Ant = anterior; Inf = inferior; Lat = lateral; Mid = middle; Occ = occipital; Parahippo = parahippocampal; Temp = temporal

Initial validity analyses showed that the visually restricted mask outperformed the broader preview task mask for dissociating categories (Fig. 4). First, the preview task visual mask had better accuracy overall when decoding the preview task data (Fig. 4A), with approximately 53% accuracy for predicting each video category, roughly 80% accuracy for classifying the fixation periods between videos (i.e., the control condition), and an overall accuracy of 58% (as compared to a chance level of 20%). Thus, the stimuli were distinguishable via decoding, despite their spatially similar activation maps.Footnote 6

Fig. 4
figure 4

Validity of the decoding method. (A) The four video categories were dissociable using our decoding methods for both the preview task and preview task visual masks. Predictions were based on training with 60% of the sample and testing on 40% of the sample over ten iterations. Probabilities were based on four video and one control (non-video) category, yielding a chance level of 20%. (B) Decoding using the preview task and preview task visual masks represented the current category during video consumption. Chance is indicated by the horizontal black lines at .2. Error bars reflect within-subjects standard errors, and asterisks reflect locations with probabilities significantly different from chance (five follow-up χ2 tests performed per each significant model: *p < .05. **p < .01. ***p < .001). BA = bike accidents; D = dance; K = kittens; L = landscapes; NV and Non-vid = non-video

Decoding of the main task consumption phase (i.e., while video viewing) was also used as a positive control, given that both the training and test data also entailed video viewing. Figure 4B shows that decoding of consumption-related activation using the preview task (PT) mask indicated a significant difference between locations [F(4, 7954) = 31.70, p < .001], with the best representation being at the current location (i.e., the location with probability above chance; Mcurr_PT = .277, SE = .009; χ2 = 58.19, p < .001; Table S2) and the non-video location falling below chance. In comparison, we observed significantly better decodingFootnote 7 using the preview task visual (PTV) mask [F(4, 7954) = 109.87, p < .001] (Mcurr_PTV = .285, SE = .005; χ2 = 71.31, p < .001; Table S2); we note that both models reported in Table S2 include four pairwise comparisons (i.e., current vs. the other four categories).

On the basis of the stronger performance of the preview task visual mask for decoding both the preview task and main task consumption data, all remaining analyses focused solely on this mask.

Visual cortices track upcoming and future locations during foraging decisions

Decoding using the preview task visual mask revealed the strongest representation of the next location [F(4, 15259) = 14.51, p < .001; Fig. 5A; Table S3] (Mnext_PTV = .23, SE = .007; χ2 = 17.67, p < .001), followed by the current location (Mcurr_PTV = .22, SE = .007; χ2 = 5.51, p = .02); we then performed seven follow-up pairwise comparisons based on this model (Table S3). We also detected a significant Location × Choice interaction [F(4, 15259) = 3.89, p = .004]; post-hoc analyses showed representations of both the current (Mcurr_PTV = .22, SE = .01; χ2 = 5.46, p = .02) and next (Mnext_PTV = .22, SE = .01; χ2 = 5.20, p = .02) locations to be the strongest during skip decisions [F(4, 7289) = 5.50, p < .001; Fig. 5B; Table S3), whereas for stay decisions, [F(4, 7949) = 12.05, p < .001; Table S3, Fig. 5B], the next location was strongest (Mnext_PTV = .24, SE = .009; χ2 = 13.00, p < .001), followed by the previous location at a trend level (Mprev_PTV = .22, SE = .009; χ2 = 3.66, p = .06); we similarly conducted seven pairwise comparisons for each of the stay and skip models (Table S3).

Fig. 5
figure 5

Decision decoding with the preview task visual mask. (A) Decoding using the preview task visual mask during the decision phase best represents the current and next locations for all trials collectively. (B) Decoding using the preview task visual mask during the decision phase best represents the current and next locations for skip trials (top), as compared to the previous and next locations for stay trials (bottom). Probabilities were based on four video and one control (non-video) category; chance is indicated by the horizontal black lines at .2. Error bars reflect within-subjects standard errors, and asterisks reflect locations with probabilities different from chance (five follow-up χ2 tests performed per each significant model: .p < .10; *p < .05; ***p < .001). Non-vid = non-video

We also note that representations were stronger during the first [F(4, 1647) = 16.00, p < .001; Fig. S7, Table S4] than during the second [F(4, 1411) = 2.62, p = .03] half of the session; see the supplemental materials, Cross-Session Shifts in Deliberation section.

Regret uniquely impacts deliberation while foraging

The prior analyses suggested that competing representations of the past and future outcomes guide an agent’s choice to stay or skip (e.g., on a skip choice, subjects may ponder: “do I accept this offer or try my luck at the gallery?”), but to what extent do that agent’s past actions impact the deliberation? If we consider time a limited resource, then trials become interdependent, and past actions might impact future decision making. To answer this question, we first tested whether past decisions impacted the neural representations during the decision time. As is shown in Fig. S8, rejection of a good offer (value > 0) on the last trial was associated with the strongest representation of the current location, particularly when the subject also chose to skip the current offer (orange box in top left cell of the figure). This suggests that a subject’s realization that an error has just been made could lead to more deliberation about whether to reject the new offer. Subjects were also slowest when making a skip decision if they had stayed on the previous trial [F(3, 15155) = 40.03, p < .001], followed by skip decisions when they had stayed on the previous trial (Fig. S9; see the supplemental materials, Decision Times in Response to Sequential Choices section).

We hypothesized that this sequencing finding was akin to experiences of regret, defined here as the realization that one’s actions yielded an unfavorable result—that is, an alternative action would have led to a preferred outcome (Bell, 1982). More specifically, a regret-inducing scenario occurred when a subject skipped a high-value offer only to encounter a low-value offer on the subsequent trial. We thus explored the possibility that humans would show more deliberation following regret (than following other serial outcomes), using the criteria from Steiner and Redish (2014). Table S5 provides detailed descriptions of regret and the four comparison conditions, where Control 1 and Control 2 reflect disappointment (i.e., the agent encounters an unfavorable offer after making the correct choice on the last trial), and Rejoice 1 and Rejoice 2 reflect the receipt of good offers after skipping the previous trial. We used mixed-effect logistic models to assess for between-condition effects (i.e., regret vs. controls) for each of the four locations plus the non-video control (e.g., previous, current, next, opposite, and non-video), yielding five models. We found that the current representations were strongest for regret-inducing scenarios [F(4, 2084) = 7.96, p < .001] (Mcurr_PTV_regr = .34, SE = .05; χ2 = 7.73, p = .005; Fig. 6), followed by Control 1 (Mcurr_PTV_ctrl1 = .25, SE = .03; χ2 = 8.63, p = .003). We then tested whether regret representations were greater than each of the four other conditions, using one-tailed tests; we found that the regret representations significantly exceeded each of the other conditions: Control 1 (z ratio = 2.08, p = .02, padj = .03), Control 2 (z ratio = 3.39, p = .0003, padj = .0006), Rejoice 1 (z ratio = 1.84, p = .03, padj = .03), and Rejoice 2 (z ratio = 4.51, p = .0001, padj = .0004). Comparatively, the neural representations following regret instances were not above chance for any of the other locations (Fig. S10); however, we did find overall differences for the opposite and non-video models, with Control 1 being above chance for the opposite model (Mopp_PTV_ctrl1 = .23, SE = .03; χ2 = 4.27, p = .04).

Fig. 6
figure 6

Regret-inducing experiences enhance deliberation: Decision decoding probabilities from the current location using the preview task visual mask for regret versus the control conditions. Chance is indicated by the horizontal black line at .2. Error bars reflect within-subjects standard errors, and asterisks reflect locations with probabilities different from chance (five follow-up χ2 tests performed: **p < .01). The additional p values reflect one-tailed odds ratios that compare regret to the four control conditions

Neural activation for difficult choices

We then investigated which brain areas were associated with difficult choices on the main task. Figure 7A shows that decision making recruited the ACC, middle frontal gyrus (MFG), bilateral hippocampus, posterior cingulate cortex, and lingual gyrus (Table S6). Likewise, video viewing after the delay recruited the ACC, hippocampus, and visuospatial areas, as well as bilateral portions of the orbitofrontal cortex (OFC), nucleus accumbens, amygdala, insula, and thalamus; we note that in the case of consumption, these regions may reflect more intensive post-consumption valuation processes rather than greater difficulty (because the subject was long past the decision phase). An intersection mask revealed that both decisions and consumption evoked the ACC, bilateral hippocampus/parahippocampus, and visuospatial areas (Fig. S11); follow-up analyses showed that both decisions and video viewing led to increased hippocampal/parahippocampal activation (Fig. 7C). Although it is possible that signals from consumption were erroneously attributed to decisions (or vice versa), given the sluggish nature of the hemodynamic response and some shorter wait times (Lindquist, Meng Loh, Atlas, & Wager, 2009), we note that subjects had not always elected to view a video on the prior trial and that we intentionally introduced jitter between the trials (i.e., the cost phase shown in Fig. 1) to help separate these events.

Fig. 7
figure 7

Neural activation related to difficult choices. (A) Activation main effects related to difficult choices during decision making (top) and consumption (bottom). (B) Contrasts showing differential activation for decision making and consumption, as related to difficult choices. (C) Contrasts reveal which cognitive and sensory areas were associated with difficult choices during decision making versus consumption (left). Both decision making and consumption recruited voxels within the hippocampus and parahippocampus (right). Error bars reflect between-subjects standard errors. ACC = anterior cingulate cortex; Cons = consumption; Decis = decision; MFG = middle frontal gyrus; OFC = orbitofrontal cortex

Finally, we contrasted choice and video viewing to determine the extent to which challenging offers were associated with different brain structures at different points in the decision process. Here we observed increased ACC and MFG activation during decision making, as compared to increased OFC and posterior insula activation during consumption (Fig. 7B and C).

Discussion

Recent theories have posited that humans engage in future-oriented thinking during deliberative decision making (Buckner & Carroll, 2007; Gilbert & Wilson, 2007; Kurth-Nelson, Bickel, & Redish, 2012; Schacter & Addis, 2011). This entails imagining rich and concrete future representations (Peters & Büchel, 2010; Redish, 2016). These processes have been directly observed in rats during foraging decisions (Steiner & Redish, 2014), and have yielded important insights as to how agents’ awareness that they have made an error—that is, an experience of regret—could drive episodic simulation during deliberation. Using the Web-Surf Task, a sequential-foraging paradigm with real-time costs and rewards, we discovered a set of human decision-making mechanisms indicative of deliberation while foraging. Our unique task design allowed us to differentiate the representations of the foreground versus background options as humans cycled between four video categories that appeared in a constant order but varied trial-by-trial with regard to the specific delay. We used multivoxel pattern analysis decoding to uncover categorical representations within a category-selective mask that contained key visual regions, and we found that choices following regret-inducing experiences led to better representations of the current offer.

Our initial decision decoding findings depict overall effects for all trials, and then separately for stay and skip choices. These results suggest the possibility of competing representations when agents make choices: We found the best representations of the current and next locations during skip trials, versus the best representations of the previous and next locations for stay trials. These initial results diverged somewhat from our expectation that foraging decisions should be more concerned with the upcoming offer (i.e., the foreground option) than with the alternatives (i.e., background), implying that the current representations would exceed the other options. Instead, our results suggest that during skip decisions, subjects waver between whether the current offer is worth it or whether they should take their chances with the next offer; this could be explained by slower responses on skip trials (mainly during the first half of the session), suggesting more difficulty in rejecting versus accepting offers. Comparatively, the background offers (i.e., previous, next) were depicted better on stay trials, which might point to broader task representations coming online. These choice-specific differences might also be understood in terms of default foraging behaviors, in which the default action is to engage with an offer (whereas continued foraging requires an override of the default option; Kolling, Behrens, Mars, & Rushworth, 2012; Sweis, Abram, et al., 2018a; Sweis, Thomas, & Redish, 2018c). Our data suggest that the default option in our task would be to stay, and the nondefault to skip. It is possible that on skip trials, stronger current and next representations emerge as the subject overrides the default (and more automatic) action; this effect might be even more amplified in situations in which the subject elected to skip after having just skipped a high-valued offer (Fig. S8).

We also found that decoding of the different locations was more clearly distinguished in the first than in the second half of the session (Fig. S7). As with our reaction time results (Fig. 2A), we can conceptualize this finding as a shift from a deliberative to a more rule-based approach, whereby earlier trials in the task required more thoughtfulness. Subjects might think more deeply on earlier trials—that is, reflect on whether a particular offer is “worth it”—before thresholds are well-established. This finding also fits with the hypothesis that repeated trials with the same (or similar) questions can yield the development of “associations.” Subjects can then draw from the association while making a decision, rather than needing to retrieve an episode (Zentall, 2010). Perhaps as subjects gain experience with the task, they form associations with the category and delays that limit their reliance on episodic simulation processes.

A more nuanced assessment of our data also highlighted the need to account for how past actions influence deliberative processes. More specifically, we observed the strongest representations of the current option when a subject had rejected a good offer only to encounter a low-valued offer; this suggests that the awareness that an alternative action would have been better (i.e., an experience of regret; Bell, 1982) led to more thoughtfulness about the current option. This regret effect was greater than the disappointment control conditions, in which a subject encountered an unfavorable outcome but had not made an error (though we note that one of these control conditions had overall probabilities above chance, suggesting that deliberation may generally be needed for evaluating low-valued offers).

Mental time travel in the context of reinforcement learning

At least three cases have been suggested for modeling episodic simulation. These include deliberative model-based learning, in which a subject uses an internal map to guide goal-directed behaviors; reflexive model-free strategies, in which learning occurs outside of a model, and instead on a trial-and-error basis; and a hybrid of model-based and model-free approaches (Dyna-Q) that utilizes offline replays—that is, replays when the subject is not moving (Cazé, Khamassi, Aubin, & Girard, 2018; Redish, 2013; Schacter & Addis, 2007b; Suddendorf, 2013). Critically, model-based learning involves using past memories to construct future possibilities, which is most likely to occur during decision making, and thus most likely what we were observing in the present task. In comparison, the reactivation of representations driving model-free strategies, which involves updating expectations after recent feedback, is more likely to occur after consumption (e.g., phasic dopamine signals during learning; Foster & Wilson, 2006; Montague, Dayan, & Sejnowski, 1996; Schultz, Dayan, & Montague, 1997), whereas Dyna-Q learning likely occurs offline (Cazé et al., 2018; Johnson & Redish, 2005). As can be seen in Fig. 7C, the hippocampus can be activated during both decision and consumption—that is, providing prospection during decisions, and reactivation during consumption.

Prospection versus retrospection versus perception

Our results for episodic simulation during deliberation can also be framed in terms of prospection, retrospection, and perception (Schacter, Addis, & Buckner, 2008), a framework that in many ways maps onto the reinforcement-learning models described above. Episodic prospection, or the anticipation of future events, hinges on a system that can flexibly recombine elements of past experiences to guide decision making (i.e., model-based learning; Redish, 2016; Schacter & Addis, 2007a, 2007b; Zeithamova, Schlichting, & Preston, 2012). Here we found the strongest representations of the subsequent location (i.e., next) in several instances (Fig. 5), with the exception of regret scenarios (Fig. 6). This suggests that subjects engaged in future-oriented thinking while deliberating at the choice point. At least two features of our analysis support our theory: (1) Subjects had more complete knowledge of the current offer (i.e., they knew the type of video and delay) than of the subsequent offer (i.e., they knew the type of video but not the delay), and (2) we trained our classifier on a separate but categorically similar set of videos, meaning that subjects did not encounter identical video rewards during the training and test phases. Because subjects did not have complete knowledge of the subsequent offer (i.e., the delay was unknown and the specific video was always novel), we suspect that subjects utilized imaginative processes shaped by prototypical information (e.g., by imagining a typical instance of what a category event would be; Kane, Van Boven, & McGraw, 2012): “What offer might be available at the next gallery? How might I respond to a high versus low delay for that kind of video? Will I enjoy another kitten video as much as the last kitten video?” The notion of episodic prospection during deliberation also fits with recent findings that model-based choices involve prospective neural signals (Doll et al., 2015).

In contrast to prospection, retrospection (or episodic memory) entails the use of past memories to execute current decisions (Zentall, 2010). Many researchers have argued that imagining the future depends on recalling the past (Addis, Wong, & Schacter, 2008; Busby & Suddendorf, 2005; Kwan, Carson, Addis, & Rosenbaum, 2010; Mullally & Maguire, 2014). Recalling the past and imagining the future also evoke similar neural processes (Addis, Wong, & Schacter, 2007; Buckner & Carroll, 2007; Schacter et al., 2012), and in the visuospatial context, both visual memory and visual imagery may depend on similar regions, including occipital–temporal sensory areas (Slotnick, Thompson, & Kosslyn, 2012). As compared to prospection, retrospection may be more akin to retrieving than to reconstructing historical information (Kane et al., 2012). Given that subjects encountered a range of offers within each category (i.e., the delays were random), deliberation on this task seems more likely to have reflected flexible reconstruction processes rather than solely the reactivation or replay of past experiences.

Our findings regarding regret-inducing situations indicated the strongest representations of the current (vs. the next) location. One might argue that this finding is not evidence of prospection but instead of perception—that is, the mental representation of a current event is considered perception (Gilbert & Wilson, 2007). However, in our study we tested for deliberation at the choice point, at which subjects had a cue indicating the type of video and delay but were not actually yet experiencing the reward. This could mean that the representation of the current offer while at the choice point was still a form of episodic prospection (as one was imagining the experience of a future reward available after enduring some cost); we note, however, that simulation of future events is supported by some of the same underlying processes that support perception (Kosslyn, Ganis, & Thompson, 2001). We suspect that decoding during the consumption phase, when subjects were actually experiencing the video rewards, might be more akin to perception.

Prospection and planning

Prospection is an umbrella term that encompasses a range of future-oriented cognitions related to episodic simulation, including planning and remembering intentions (Schacter et al., 2008; Szpunar, Spreng, & Schacter, 2014, Fig. 1). Planning broadly entails the identification and organization of steps needed to achieve a specific goal, whereas episodic planning refers to the identification and sequencing of steps toward a specific autobiographical future event (Spreng, Gerlach, Turner, & Schacter, 2015). Autobiographical planning, in particular, draws from self-relevant memory and goal-directed planning processes, and the planning of specific autobiographical outcomes might evoke the same brain regions involved in prospection and goal-directed cognition (Szpunar et al., 2014). However, it is worth noting that contemplating future plans can actually create a cost to ongoing performance (Marsh, Hicks, & Cook, 2006; R. E. Smith, 2003); that is, actively maintaining future intentions can deplete current attentional resources. One way to reduce performance interference would be by associating the intention with a specific future context. Gollwitzer (1999) called this process “implementation intentions,” or plans that connect intentions with specific anticipated events—for instance, “if faced with a delay above 15 s on a bike accident video, I will skip”—versus broader goal intentions, such as “I intend to skip many of the bike accident videos.” It is possible that implementation intentions enhance performance because an intention is linked with a specific mental representation about the future that can later cue that intention (Schacter et al., 2008). Taken together, prospection and planning might intersect in the realm of implementation intentions. We theorize that the subjects in our task formulated these plans across the session, as evidenced by downward shifts in reaction times and diminished mental representations at the decision time.

Comparisons with the rodent literature

We detected several behavioral and neural cross-species parallels with respect to deliberative decision making: First, the reaction time patterns in humans were analogous to rodent VTE behaviors during the Restaurant Row task (Schmidt, Duin, & Redish, 2019; Steiner & Redish, 2014; Sweis, Thomas, & Redish, 2018c), as indicated by longer reaction times on offers just above threshold—that is, more difficult choices. This pattern is also analogous to the slower response latencies observed when humans make decisions with uncertain outcomes (Satterthwaite et al., 2007), which fits with notions that VTE reflects uncertainty that drives deliberation (Redish, 2016). We again note, however, that this pattern was more pronounced in the first half of the session for our human subjects.

Second, hippocampal task-based activation scaled with choice difficulty during decision making and consumption, revealing a novel neural signature of deliberation that translates across species. Difficult choices also recruited the ACC and MFG (including the dorsolateral prefrontal cortex) more strongly during the decision phase. These areas are involved in cognitive control and conflict monitoring, and they might respond to the uncertainty and error potential of difficult trials (Botvinick, Cohen, & Carter, 2004); previous research has also implicated the ACC in decision difficulty during a foraging task (Shenhav et al., 2014) and in tracking value in an uncertain reward environment (Behrens, Woolrich, Walton, & Rushworth, 2007). Moreover, the MFG is theorized to initiate VTE (Redish, 2016; Schmidt et al., 2019; Wang et al., 2015). This follows from rodent findings that disrupting hippocampal representations actually increases VTE, making the hippocampus an unlikely candidate for initiating the VTE process (Bett, Murdoch, Wood, & Dudchenko, 2015). Instead, the rodent prelimbic cortex, arguably homologous to aspects of human prefrontal cortex, might initiate this process, given its role in outcome-dependent decisions and its influence on goal-directed activity in the hippocampus (Ito, Zhang, Witter, Moser, & Moser, 2015; Killcross & Coutureau, 2003; Spellman et al., 2015). Findings from nonhuman primates that the dorsolateral prefrontal cortex generates action plans prior to action execution further support this theory (Mushiake, Saito, Sakamoto, Itoyama, & Tanji, 2006). As compared to decision making, consumption led to more activation in the lateral OFC for difficult trials. This aligns with rodent findings that have implicated the OFC in postdecisional outcome evaluation (Stott & Redish, 2014). Overall, VTE might represent a cross-species mechanism that underlies deliberation during foraging decisions.

One notable cross-species divergence comes from our regret results. In humans, we found that regret instances led to greater representation of the current location, whereas Steiner and Redish (2014) found that in rodents such instances were linked to better representation of the previous location—that is, the counterfactual offer. One possibility is that experiences of regret foster more present-focused deliberative processes in humans, whereby humans become more attentive to the current decision.

Conclusions

In the present study we employed a sequential experiential foraging paradigm to evaluate episodic simulation during deliberative decision making in humans. Our results revealed that visual cortices represented the current or foreground offer during the decision phase, particularly following regret-inducing experiences. Furthermore, humans demonstrated behavioral and neural signatures comparable to those of VTE, which could suggest a common mechanism of decision making that translates across humans, rodents, and monkeys.

Author note

We thank the members of the TRiCAM lab for assistance with data collection, with special thanks to Yizhou Ma and Amanda Reuter. We also thank both the TRiCAM and Redish labs for providing helpful discussion. This work was supported by grants from the National Institute on Drug Abuse to both S.V.A. (F31-DA040335) and A.D.R. (R01-DA030672); grants from the National Institute of Mental Health to A.D.R. (R01-MH112688, R01-MH080318); funds from the German federal state of Saxony-Anhalt and the European Regional Development Fund, Project: Center for Behavioral Brain Sciences, to M.H.; and a CLA Brain Imaging project grant (from the University of Minnesota) to A.W.M. The writing of this article was supported by the Department of Veterans Affairs Office of Academic Affiliations, the Advanced Fellowship Program in Mental Illness Research and Treatment, and the Department of Veterans Affairs Sierra-Pacific Mental Illness Research, Education, and Clinical Center. The authors have no conflicts of interest to disclose. S.V.A., A.D.R., and A.W.M. designed the experiment; A.D.R. and A.W.M. supervised the project; S.V.A. carried out the experiments and analyzed the data, with assistance from M.H., A.D.R., and A.W.M.; and S.V.A., M.H., A.D.R., and A.W.M. wrote the manuscript. The data and materials for all experiments are available upon request. None of the experiments were preregistered.