Hunger improves reinforcement-driven but not planned action

Human decisions can be reflexive or planned, being governed respectively by model-free and model-based learning systems. These two systems might differ in their responsiveness to our needs. Hunger drives us to specifically seek food rewards, but here we ask whether it might have more general effects on these two decision systems. On one hand, the model-based system is often considered flexible and context-sensitive, and might therefore be modulated by metabolic needs. On the other hand, the model-free system’s primitive reinforcement mechanisms may have closer ties to biological drives. Here, we tested participants on a well-established two-stage sequential decision-making task that dissociates the contribution of model-based and model-free control. Hunger enhanced overall performance by increasing model-free control, without affecting model-based control. These results demonstrate a generalized effect of hunger on decision-making that enhances reliance on primitive reinforcement learning, which in some situations translates into adaptive benefits. Supplementary Information The online version contains supplementary material available at 10.3758/s13415-021-00921-w.

: Hunger did not alter other types of reinforcement learning. A) Repetition of second-stage choices was enhanced following rewarded trials and was not altered by food deprivation. B) Repetition of first-stage actions was enhanced following rewarded trials, regardless of stimulus identity or the level of food deprivation. C) There was no correlation between first-stage action repetition and the model-free (MF) values of first-stage choices. Action repetition did not contribute to model-free control of stage 1 choices. Error bars represent SEM. * p < 0.05, *** p < 0.001.
The second measure is stay-switch behavior for outcome-irrelevant information (Shahar et al., 2019). Participants may associate actions (i.e. choosing left or right), rather than stimulus identity, with reward outcomes and repeat an action after receiving a reward on the previous trial for that action. Given that only the stimulus identity, not the stimulus location, has predictive effects, this type of behavior could be seen as a measure of aberrant incidental learning, in which a value is assigned to an action that was not inherently rewarding. Participants repeated the same action more often after receiving a reward (main effect of reward [F 1,31 = 6.34, p = 0.017, η 2 p = 0.17]). Food deprivation did not alter overall stay behavior (main effect of food deprivation [F 1,31 < 1]) or outcome-specific stay behavior (interaction effect of food deprivation and reward [F 1,31 < 1]; Fig. S1B). The significant effect of reward on action repetition did not affect the contribution of the model-free system to first-stage choices (Fig. S1C). Together, these analyses show that food deprivation selectively affected model-free control of first-stage choices, but not all types of reinforcement-driven learning.

Computational modelling
We optimized the model parameters by minimizing the negative log-likelihood (LL) of the data, given different parameter settings, using MATLAB's fminunc function to obtain unconstrained parameter estimates: (1)

Parameter transformations
The unconstrained Gaussian distributed parameter estimates x i ∼ N (µ x , σ x ), with a population mean of µ x and a standard deviation of σ x , were transformed into bounded model parameters using logistic/exponential transformations to ensure that the model parameters were biologically plausible and interpretable. This transformation is justified by the premise that each individual subject (with a parameter value x i ) is randomly drawn from a population of subjects with normally distributed parameters (with population mean µ x and a standard deviation of σ x ) (Daw et al., 2011;Wunderlich et al., 2012). This is important for our analysis because normally distributed parameters permit the use of parametric tests to identify differences between conditions. We denote the model parameters by Greek letters and the Gaussian scaled parameter estimates by their respective Latin letters.
The relationship between the [0,1]-bounded model parameter α and the Gaussian parameter estimate a is given by a logistic function: The relationships between the [0, ∞)-bounded (logarithmically scaled) model parameters β and π and the Gaussian parameter estimates b and p are given by the exponential function: π = exp(p). (4)

Parameter recovery
To validate the parameter estimates generated by the fitting procedure, we conducted a parameter recovery analysis. For each parameter, we generated samples from the marginalized posterior distribution of the fit to get realistic parameters that could describe choice behavior. The generated parameters were uncorrelated (|R| < 0.3), allowing for testing whether the fitting procedure introduced any confounding factors. Using these parameters, we simulated choice behavior for 32 virtual subjects following the task design described in the method section. We then used a hierarchical model fitting procedure to estimate parameters for the simulated data ("Recovered parameters"). We assessed the quality of the parameter recovery by comparing the true parameters used to simulate data with the recovered parameters. We calculated the Pearson correlation between all pairs of recovered parameters to test whether the fitting procedure introduced spurious correlations. A strong correlation between the true and recovered parameters indicates a good recovery of the parameters and reliable model fitting results. All parameters were well recovered (0.65 < R < 0.95) and the model fitting procedure did not introduce spurious correlations between the other parameters (|R| < 0.4; Fig. S2 & S3). Figure S2: Confusion matrix. Correlation matrix of the free parameters used to generate the simulated data ('True parameters') and the obtained parameters by applying the parameter estimation procedure on the simulated data ('Recovered parameters'). Bright red values indicate a strong correlation between the true and recovered parameter value and therefore a good parameter recovery. Figure S3: Parameter recovery. Correlations of the free parameters used to generate the simulated data ('True parameters') and the obtained parameters by applying the parameter estimation procedure on the simulated data ('Recovered parameters').

Surrogate data
To confirm that our model can capture our key behavioral findings, we generated data for 32 virtual subjects on this task using the individual best-fitting parameters (Table 1) and the same trajectories of reward probabilities as experienced by the participant. Each simulation was repeated 30 times to obtain an average. These data were then subjected to a stayswitch analysis. We found an identical pattern of effects in these generated data as observed empirically in our participants (Fig. 4C).  Table 1: Best-fitting model parameters estimates. Separately shown for the sated and food deprived condition as median and quartiles across participants.  Note that our sample size and design was not intended to examine between subjects effects. However, we wanted to rule out that age might drive some of the effects we see. Age was included in the ANOVA as a covariate, and there were no significant effects or interactions involving age.

Main effects for individual participants
Previous studies have reported effects of BMI on model-based decision-making (Voon et al., 2015). Hence, we asked whether we could see this effect in our sample. Including BMI as a covariate revealed several significant three-way interactions with BMI. Since this was a four-way ANOVA, which are notoriously difficult to interpret, we performed a median split on BMI. Heavier participants showed a strong effect of hunger on model-free learning (F = 8.61, p = 0.010), the same as reported at the group level. Lighter participants, in contrast, did not (F = 0.57, p = 0.461). Unexpectedly, these participants showed an effect of hunger on transition (F = 10.07, p = 0.006), which we can only explain as an anomaly of being underpowered for this sub-analysis.