Introduction

Food decisions are an integral part of our daily lives. The quality and quantity of the foods we eat have a substantial impact on our health and well-being. There has been a great deal of basic science and clinical research into the determinants and consequences of food choices, as well as numerous efforts to design and validate interventions that lead to healthier eating behaviors1,2,3,4,5. Food choice paradigms play a central role in much of this research.

There are several ways food choice paradigms are used in basic and clinical research. Food choice paradigms are a critical tool for investigating the nature of the decision process itself at the behavioral, computational, and neural levels6,7,8,9,10,11,12,13,14,15,16,17,18. In addition, they can be used to compare groups19,20,21,22,23, evaluate the effectiveness of a separate intervention (e.g. behavioral, pharmacological, surgical)2,3,4,16,24,25,26, or even be incorporated into the intervention itself27,28,29,30.

Food choice paradigms come in many different flavors. The exact details of the food choice task used in a given study will depend on the hypothesis being investigated and the practical constraints of the experimental setting. For example, in some cases, participants make choices over foods types and quantities in real buffet-like settings, fixed-option ad libitum meals, or snacking opportunities, while in other experiments plastic food replicas, pictures, or text are used to indicate the available options1. Often in studies using plastic replicas, pictures, or text, participants are asked to make many different food choices with the understanding that one of those choices will be selected to count, where participants will consume the chosen food “for real”. In general, individuals’ choices over food representations such as pictures are associated with aspects of their actual eating behaviors when measured on the same day or in the future and anthropomorphic measures20,21,31, indicating that these paradigms can have a useful degree of ecological and external validity.

Here, we report on decision patterns across five repetitions of a food-picture-based choice paradigm that explicitly cued participants to consider certain food attributes before making their choices on a subset of trials. We refer to this paradigm as the cued-attribute food choice task. Several previous studies have shown that, when participants complete the cued-attribute food choice task a single time, the attribute cues have significant effects on both choice outcomes and brain activity measured by functional magnetic resonance imaging (fMRI)7,13,32,33. Specifically, on trials in which they were cued to consider the healthiness of the food options before making a consumption decision, participants selected healthier items more often compared to baseline and other cue conditions (e.g., cues to consider tastiness or cues to choose naturally based on whatever comes to mind). Food choices in the health-cued condition have been associated with the structure and functional activity within and connectivity between the dorsolateral and ventromedial prefrontal cortex7,32,33.

However, it is unknown whether repeated experience or practice with the cued-attribute food choice task will lead to changes in choices during health-cued or baseline trials. Does the influence of the health cues increase, decrease, or remain stable with experience and/or time? Does repeated experience with explicitly considering health-related attributes in the cued trials spill over to affect choices in the baseline trials? Previous work has shown that choosing one food over another in an experiment setting can lead to both short and long-term changes in the relative evaluations of the chosen and unchosen items29,30,34. Thus one might expect changes in behavior when repeating the cued-attribute or any other food choice task. Furthermore, explicitly reappraising positive or negative aspects of food items during regulation of craving training has been shown to change subsequent dietary choices as well27,28. The cued-attribute food choice task does not necessarily involve active re-evaluation or reappraisal, but it does direct attention toward healthiness attributes that are often overlooked by the average decision maker when choosing between food items. It is plausible that once a food item appears in a health-cued trial, some of its healthiness aspects remain salient and influence its evaluation if it subsequently appears in a natural, baseline trial. This carry over from health-cued to natural decisions may increase with repeated exposures in health-cued trials leading to smaller differences in choice behavior between health-cued and natural trials after extended experience with the cued-attribute food choice task. Alternatively, more experience shifting into a healthiness-oriented mindset in response to the task cues might lead to greater differences between the health-cued and natural trials. However, past studies using this task found no evidence for changes in choice behavior within or across conditions over the course of a single session7. Moreover, food preferences in daily life are quite stable and behavior in non-cued food choice paradigms shows high test-retest reliability within a laboratory setting35. Thus, there are reasons to think that repeated administration of the cued-attribute food choice task will or will not lead to changes in dietary decision making.

We show that the average pattern of choices in the baseline condition as well as the influence of the health cues remain fairly stable over five repetitions of the task across 14 days. Furthermore, the test-retest reliability of individuals’ choices was high across both conditions, consistent with previous reports of high test-retest reliability in other food choice paradigms35. Interestingly, the reliability of subjective healthiness ratings was also high, but taste or palatability rating reliability was only fair.

Results

Figure 1 shows a schematic representation of the food choice task used in this study. Participants were instructed to fast three hours prior the experiment in order to increase the value of foods. In the beginning of the experiment, participants completed a rating phase for 180 images in which they judged in two different phases, how tasty or how healthy they thought each food item to be. In each trial of the subsequent choice task participants have to choose between two food items. Within the choice task, there were two types of decision conditions that differed in the cues that are provided for the participants. In the health-cued condition, subjects are cued to consider the healthiness of the foods while making decisions. In the natural-cued condition, subjects are cued to make decisions naturally based on whatever freely comes to their mind. At the end of each task session one of their choices was randomly selected and participants received and had to eat the chosen food while in the lab. Thus, in both the health-cued and natural-cued conditions, participants had to keep in mind the fact that they may have to eat the food they select at the end of that day’s session.

Figure 1
figure 1

Experiment structure. The experiment had two phases, ratings and choices. In the ratings phase, participants rated the taste and health aspects of the foods using a visual analog scale ranging from − 5 to + 5. The order of the two ratings was counterbalanced. In the choice task, participants had to choose one of the two food items to eat at the end of the experiment. Within the choice phase, there were two conditions that differed in the attention cues given to the participants. In the health-cued condition, subjects were cued to consider the healthiness of the foods while making decisions. In the natural-cued condition, subjects were cued to make decisions naturally using whatever features freely came to mind. At the end of the session on each day, one of the participant’s choices was randomly selected and the participant was given the chosen food to eat in the behavioral laboratory.

Choice outcomes

We examined the effects of attribute (taste, health), context (natural-cued and health-cued condition), and session (1:5) on healthy choice behavior using a Bayesian hierarchical logistic regression. This regression estimated the probability of selecting the healthier of the two food options as a function of the differences in tastiness and healthiness ratings between the foods, the cue condition (natural-cued, health-cued), and the experimental session (1–5). Table S1 show that, during the baseline (i.e. natural-cued) condition on the first session, the difference in the taste attribute was significantly associated with the choice outcome (1.22, 95% highest density interval (HDI) = [0.94, 1.51]). In contrast, the difference in the healthiness attribute was not significantly related to choice outcomes during natural-cued trials (\(-\,0.02\), 95% HDI = [− 0.20, 0.15]). In session 1, healthier choices were made more often in the health-cued relative to natural-cued condition. There was a significant main effect of health-cued trials (1.52, 95% HDI = [1.04, 1.98]), such that the healthier options were chosen significantly more in the health-cued condition compared to the baseline (natural-cued condition). There were significant interactions between the health cues and the differences in both the taste (\(-\,0.73\), 95% HDI = [\(-\,1.02\), \(-\,0.45\)]) and healthiness attributes (1.22, 95% HDI = [0.85, 1.60]). This means that the taste attribute had a significantly stronger effect in the natural-cued condition compared to the health-cued condition, and the health attribute had a significantly stronger effect in the health-cued condition compared to the natural-cued condition.

The overall pattern of behavior observed in session 1, remained similar in sessions 2–5 (Fig. 2). There was a slight increase in the average number of healthy choices during the natural condition in session 2 relative to session 1, but no significant difference in natural-cued choices between session 1 and any other session (Table S1). There was a significant increase in the number of healthy choices during the health-cued compared to natural-cued trials within all five sessions, although the difference between health and natural-cued trials decreased slightly after the first session (Table S1, interaction term condition:session). Figure 2 shows that there were slightly more healthy choices during the natural-cued and slightly few healthy choices in health-cued trials in sessions 2–5 versus 1. In addition, the influences of the taste and healthiness attributes differed less between the natural and health-cued trials in sessions 2–5 versus 1 (Table S1 interaction terms td:condition:session and hd:condition:session). Notably, the effects of health cues on the way the two attributes influenced choice outcomes and the overall proportion of healthy choices did not significantly differ in sessions 2–5. In other words, from session 2 onward, choice behavior was quite stable.

In addition to choice outcomes, we also tested the repeated subjective ratings for taste and healthiness attributes. The group average ratings did not significantly change across sessions (Fig. S1). The ratings for the taste and healthiness attributes give us information about the participants’ subjective opinions of those attributes. This in turn allows us to test how the subjective opinions about the taste and healthiness attributes influence the food choices participants make. This influence is not necessarily fixed, indeed it changes substantially across the two conditions as illustrated in Fig 2b. Together, the repeated choices and ratings indicate that, aside from a small change between the first and second sessions, the subjective ratings for taste and healthiness as well as the impact of those attributes on choice outcomes within each condition remain similar at the group-average level.

Figure 2
figure 2

Effects of health-cues and attribute differences on choice outcomes and RTs over the five sessions. (a,b) Shows the proportion of healthier choices on the y-axis as a function of condition (health-cued, natural-cued choice) or attribute differences (computed as healthier item minus less-healthy item) on the x-axis. The results in each session are indicated by the separate colors shown in the legend. (a) The panel shows that healthier choices were higher in the health-cued condition compared to the natural-cued condition in all five sessions. (b) The panel shows how the two attributes (taste and healthiness) relate to choice outcomes in each of the two conditions (natural-cued and health-cued). (c,d) are analogous to (a) and (b) except that they show the logarithm of response times on the y-axis as a function of condition or attribute differences on the x-axis. The error bars in both (a) and (c) represent the 95% HDIs.

To address the question of how well ratings from the first session relate to choices in subsequent sessions, we have done two sets of analyses. We used the fitted regression coefficients from the regression model model described in Table S1 to calculate the probability of choosing the healthier option with respect to the health and taste ratings (always from session 1) of the two options. If we treat a probability \(\le 0.5\) as choosing the less healthy option, and probabilities \(\ge 0.5\) as choosing the healthier option, then this model would coincide with the participants actual choice behavior \(72\%\) of the time in session 1 compared to \(73 \pm 1\%\) in sessions 2–5.

We also conducted a different set of analyses to directly predict choices from the taste and healthiness ratings separately. In this case the predictions were simply that the participants would choose the item that they rated as tastier (healthier) in session 1 over the alternative item. Taste ratings from session 1 predicted session 1 choices correctly \(72 \pm 3\%\) of the time in natural-cued trials, and \(57 \pm 2\%\) of the time in health-cued trials. Healthiness ratings from session 1 predicted session 1 choices correctly \(53 \pm 2\%\) of the time in natural-cued trials , and \(59 \pm 2\%\) of the time in health-cued trials. The regressions summarized in Tables S4 and S5 show that these prediction rates did not significantly change when using session 1 ratings to predict choices in the subsequent sessions, 2–5.

Response times

Participants responded faster in both choice conditions as their experience with the cued-attribute food choice task increased. However, the effects of the health cue and differences between the healthiness and taste attributes remained consistent over all five sessions. Health cues increased the influence of healthiness attributes on choice response times. The differences between the healthiness attributes of the two food options had no significant effect on the response times in the natural condition (0.00, 95% HDI = [\(-\,0.01\), 0.01]). However, differences in healthiness did influence response times for health-cued trials, such that participants responded faster when differences in health where larger (health difference interaction effect = \(-\,0.03\), 95% HDI = [\(-\,0.05\), \(-\,0.02\)]. In contrast, the differences between taste attributes were significantly associated with response times in both natural and health-cued trials, with larger differences in taste leading to faster response time in both conditions (taste difference main effect = \(-\,0.02\), 95% HDI = [\(-\,0.03\),\(-0.01\)]). In addition to the interaction between the health-cued condition and health ratings, there was also a main effect condition. Response times were significantly faster overall in the health-cued relative to the natural-cued condition (main effect of health-cue = \(-\,0.02\), 95% HDI = [\(-\,0.03\), 0.00]). The response time patterns in both conditions remained fairly stable across all five sessions, although participants became faster with more experience in the task and there were small changes in the sensitivity to taste and healthiness differences (Tables S2 and S3).

Time-varying diffusion decision model analysis

We used a time-varying diffusion decision model (tDDM) to examine if repeated experience with the cued-attribute food choice task changed the evidence accumulation process assumed to occur during food decisions. The specific tDDM that we used allows for each attribute (here taste and health) to begin to influence the decision process at a different time36,37,38. We refer to this model as the relative-starting-time DDM (rstDDM). The RST parameter represents the relative advantage in initial processing time for the taste attributes. If RST is positive, then taste-related attributes are considered before healthiness attributes, whereas if RST is negative then healthiness attributes are considered first. The drift weighting parameters in the rstDDM indicate the relative contribution of taste and health to the evidence accumulation process and decision outcome. Figure  3 shows the drift weighting and RST parameters by session and condition (see Table 1 for the full set of parameters).

Table 1 rstDDM Parameters.

Health cues had a significant and stable influence on rstDDM parameters across all five sessions. On average, participants weighted taste attributes more and considered them sooner than health attributes in the natural-cued condition during the first-session (taste weight: 0.8940, 95% HDI = [0.8200, 0.9660], health weight: \(-\,0.132\), 95% HDI = [\(-\,0.275\), \(-\,0.009\)], rst: 0.1790, 95% HDI = [0.021, 0.448]), and this pattern held across all sessions (Fig.  3). However, during the health-cued trials health weights were higher than in the natural trials (0.8120, 95% HDI = [0.7520, 0.8780]), while in contrast taste weights showed the opposite pattern and were significantly higher in the natural-cued condition relative to the health-cued condition (0.4670, 95% HDI = [0.4180, 0.5150]). Comparing the health and taste weights within each condition revealed that the health weight was significantly higher than taste weight in the health-cued condition (0.4870, HDI= [0.4290, 0.5430]), and taste weight were significantly higher than health weight in the natural-cued condition (0.7910, HDI= [0.7240, 0.8620]). The total contribution of taste plus health attributes to the drift rate was significantly higher in the health-cued than natural-cued condition (0.3450, 95% HDI = [0.2760, 0.4130]), which is consistent with the pattern of faster mean RTs in the health-cued trials shown in Fig. 2. In addition to weights, health cues also significantly changed the relative-starting times for each attribute within the decision process. In the first session, the RST was less than zero (i.e., healthiness was considered earlier than tastiness) in the health-cued condition, while it was greater than zero in the natural-cued condition (i.e., tastiness was considered earlier healthiness). Moreover, the RST parameter was significantly smaller in the health-cued compared to natural-cued trials (\(-\,0.1310\), 95% HDI = [\(-\,0.2680\), \(-\,0.0530\)]). Health-cues continued to promote the consideration of health attributes before taste across all five sessions. Across sessions, the consideration start times for healthiness and tastiness in natural cued trials became more similar (i.e. RST was closer to 0), and less variable across individuals (Fig.  3).

Figure 3
figure 3

Group-level rstDDM parameters by condition across sessions. These three plots show the overall trends for the taste weight, health weight, and RST parameters in the natural-cued and health-cued conditions across sessions. The values for the health and taste weights are given in arbitrary units, while the RST parameter is specified in seconds. The taste weight was higher in the natural-cued condition compared to the health-cued condition in all sessions (left panel). The health weight was higher in the health-cued condition compared to the natural-cued condition in all sessions (middle panel). The RST parameter was qualitatively lower (i.e. healthiness was considered earlier) in the health-cued condition compared to the natural-cued condition in all sessions (right panel), although there was substantial individual variability in the RST parameter during the natural-cued trials. All group-level parameters other than the RST parameter for natural-cued trials were quite stable and did not differ with repeated experience across sessions. The shaded bars indicate the 95% HDIs for each parameter. The natural-cued condition is indicated by orange color and health-cued condition is indicated by green color.

In general, individual participants’ rstDDM parameters were fairly stable across sessions. Figure 4 shows that the ranges (\(max - min\)) for the parameter estimates across the five sessions were fairly small, with the notable exception of the RST for natural-cued trials. Comparisons across conditions showed that the ranges were significantly smaller in the health-cued condition than in the natural-cued condition (Taste: \(tstat=-\,3.1051\) , \(pvalue=0.0052\), Health: \(tstat=-\,2.4076\), \(pvalue=0.0249\), RST: \(tstat=-\,14.5627\), \(pvalue = 8.8958e-13\)).

Figure 4
figure 4

rstDDM parameters stability. The boxplots show that the values of most subject-level parameters didn’t change much across the 5 sessions. The values for the health and taste weights are given in arbitrary units, while the RST parameter is specified in seconds. However, the changes in attribute weights on the drift rate over time/experience were significantly greater in the natural-cued condition compared to the health-cued condition. We address the high variability of the RST parameters in the natural-cued condition further in the discussion section.

Test-retest reliability at the individual level

Having found that group average decisions within the cued-attribute food choice task are fairly stable over 14 days and 5 repetitions of the task, we next tested the reliability at the individual level. Specifically, we computed the intra-class correlation coefficients (ICC), for three different task measures: 1) frequency of healthier choices, 2) subjective ratings of taste and healthiness, and 3) rstDDM parameters (health/taste weight, and RST). In the text below we interpret the ICC values as: Poor \((< 0.40)\), Fair \((0.4-0.6)\), Good \((0.6-0.75)\), and Excellent \((0.75-1.0)\)39.

Individuals’ choices showed excellent test-retest reliability across the five sessions. The frequency of healthier choices showed excellent reliability across the 5 sessions both within the natural- and health-cued conditions separately and as a difference score between conditions. (natural-cued condition, \(ICC(A,1) = 0.808\), \(F(22,86.8) = 23.3\) , \(p = 1.74e-27\), \(95\%\)-Confidence Interval: \(0.688< ICC < 0.901\), health-cued condition, \(ICC(A,1) = 0.812\), \(F(22,91.9) = 22.6 \), \(p = 6.06e-28\) , \(95\%\)-Confidence Interval: \(0.695< ICC < 0.903\), difference between conditions, \(ICC(A,1) = 0.812\), \(F(22,89) = 23.5\) , \(p = 4.79e-28\), \(95\%\)-Confidence Interval : \(0.694< ICC < 0.902\)).

The test-retest reliability for participants’ subjective healthiness and tastiness ratings was excellent and fair, respectively. Note that participants did not face the exact same choices in each session, but rather similar types of choices drawn from the same set of 180 food items. In addition to making choices, the participants rated each of the 180 food items for taste and healthiness in sessions 1 and 5. They rated a subset of 60 food items on taste and health in all five sessions. We computed the ICC both for full set of 180 and subset of 60 items. Rating reliability at the individual level was excellent for health and fair for taste ratings (60 items in all 5 sessions, health: \(0.808 \pm 0.171\), taste: \(0.579 \pm 0.184\), Table S6, 180 items in sessions 1 and 5, health: \(0.778 \pm 0.216\), taste: \(0.532 \pm 0.190\), Table S6). We also analyzed rating reliability as a function of food image rather than individual. At the food image level, we found that reliability was fair for both health and taste ratings (across five sessions: health: \(0.552 \pm 0.146\), taste: \(0.574 \pm 0.120\), Table S7, between session 1 and 5 alone: health: \( 0.468 \pm 0.251\), taste: \(0.516 \pm 0.197\), Table S8).

Lastly, we computed the test-retest reliability of the rstDDM parameters fit to the participants’ choice outcomes and response times. Table 2 reports the ICC values for each parameter by condition. The reliability of the parameters ranged from excellent to good in the health-cued condition. However, the reliability of the same parameters fit to the same individuals was only fair to poor in the natural-cued condition. As a consequence of the lower reliability of the parameters in the natural-cued condition, the reliability of the difference scores for taste and health weights is also lower.

Table 2 Intra-class correlation coefficients (ICC).

Discussion

Previous research has shown that food choice tasks using photographic images as stimuli have good test-retest reliability when participants make choices naturally (i.e. without explicit instructions about how to choose or what attributes to consider)35. However, unlike the previously tested paradigms, there are clear differences in the ways the same participant will make choices across the different conditions within the cued-attribute food choice task7,13,32,33. Participants decision outcomes and response times are more strongly influenced by healthiness attributes and they choose the healthier item more often in health-cued compared to natural-cued trials. We tested whether these intra-individual differences changed over the course of 14 days and five repetitions of the cued-attribute food choice task. Overall, choice outcome patterns were stable and had excellent test-retest reliability within and across the decision conditions over the five task repetitions.

Interestingly, healthiness ratings had excellent reliability, but the reliability for taste ratings was only fair. This suggests that an individuals’ estimates of the tastiness of food items may vary more from day to day, or even moment to moment than their opinions about healthiness. Previous work has found that liking or attractiveness ratings for food items, which are presumably closely tied to taste, vary from the first to second round of ratings within a given day40,41. Palatability is more subjective than healthiness and may be influenced more by previously rated items, context effects, or noise during the process of retrieving the past consumption experiences that are thought to be used as a basis for present valuations42. Healthiness is not something that is directly experienced and thus may not be computed using samples from episodic memory the same way tastiness is presumed to be. While the current study was not designed to directly address the sources of consistency and variability within and across different food attributes, this is an important avenue for future research.

Although choice outcomes had excellent reliability in both the natural and health-cued task conditions, the reliability of rstDDM parameters differed across conditions. A much stronger reliance on tastiness than healthiness attributes may have led to the lower test-retest reliability for rstDDM parameters fit to the natural-cued choices. Both the rstDDM and the logistic regression analyses showed that, on average, healthiness had little influence on food choice outcomes during natural-cued choices in our sample of participants. Figures 2b and 3 show that, holding tastiness fixed, the influence of healthiness on food decision is approximately zero at the group level. This is not too surprising given that the individuals were recruited because they consumed sweet or savory snacks on a regular basis, but attempting to eat a healthy diet in daily life was not an inclusion criterion. Thus, concerns about the palatability of the food items dominated the decision process in natural trials. The near-zero influence of healthiness on food choices in natural trials means that the rstDDM has little power to accurately identify the relative attribute weighting and consideration start timing parameters (RST).

The rstDDM model we fit to the data can accurately identify and distinguish between attribute weights and RST parameters when both weights are non-zero36,38, but (near) zero weights are a problem for it. Either a sufficiently low drift weighting coefficient or long delay in consideration starting time can effectively eliminate the influence of an attribute on choice outcomes within the rstDDM framework. The excellent reliability of the drift weighing parameters and good reliability of the RST parameter in the health-cued trials proves that the rstDDM can yield highly reliable parameter estimates if fit to a suitable set of data. Thus, the lower reliability of the rstDDM in the natural-cued trials serves to demonstrate the importance of checking that the data conform to the expectations and requirements of a given modeling approach. If one of the drift weights from the rstDDM is close to zero for a given data set, then one is probably better off fitting the more common version of the DDM43, which fixes the RST parameter to zero and only estimates the drift weights (see Table S9 for the full set of parameters, and Table S10 for the reliability values).

Given its high level of test-retest reliability, the cued-attribute food choice task appears to be well suited to serve as an outcome measure for cognitive, physical, or pharmacological interventions that target food consumption decision processes. This task may be useful as an outcome measure because examines decisions with and without an explicit cue/instruction to consider healthiness aspects during the choice. The health cues cause participants to make more choices in favor of the healthier option and exhibit brain activity pattern similar to those observed in participants that exert self-control over dietary choices endogenously7,13,19,32,33. Although, there is a substantial increase in the proportion of healthier choices made in the health-cued condition, on average, participants do not always choose the healthier item in these trials. Thus, the task could be used to determine if an intervention increased cued dietary self-regulation. In addition, the natural, uncued trials allow experimenters to assess the baseline, unprompted choice behavior. In this case, it provides insight in to endogenous self-regulation. It may be that a specific intervention eliminates the difference between health-cued and natural trials by bringing the natural attention and decision mechanisms in line with those prompted by the cues. Ideally, an intervention designed to promote healthier food choices would cause its participants to make healthy choices at near-ceiling rates in both task conditions. If it turns out that an intervention boosts healthy choices in one condition, but not the other, then we would know that the intervention is more effective for externally vs internally cued regulation or vice versa. We think these features of the cued-attribute food choice task and its high level of test-retest reliability recommend it as one of a suite of intervention outcome measures. However, it is important to note that here we have only tested healthy young adults. Further examinations of test-retest reliability in individuals with obesity, eating disorders, and other conditions are warranted.

The cued-attribute food choice task explicitly prompts participants to incorporate health-related attributes into their decision process. In combination with functional magnetic resonance imaging, the health-cued condition within this task has been shown to engage prefrontal brain activity patterns similar to those observed during endogenous dietary self-control in individuals who successful lost or maintained their desire weight7,19,32,33. Thus, the cued attribute food choice task may be a means of testing how well an intervention shifts externally cued and endogenous health-promoting decision processes.

Methods

Participants

Twenty-three participants completed the five sessions of the experiment across 14 days. All the participants (10 female, ages between 18 and 30 years) gave written informed consent for their participation in accordance with the regulations of the Zurich Cantonal Ethics commission. They received a flat fee in addition to their food reward for the time they spent for the task. We screened participants by email and phone to ensure that they did not follow any specific food diet, and they consumed sweet and/or savory snacks regularly. We also rechecked the exclusion criteria on each day of testing. All participants were healthy and without any current or recent psychiatric, metabolic or neurological illness. On the day of the study, participants sent us a photograph of the meal they consumed 3 hours before their appointment as an indication that they followed our instructions to eat a small meal and then fast for 3 hours before coming into the lab. All five laboratory visits took place between 17:15 and 19:15. The first visit occurred on a Wednesday, visits 2:3 took place on the Monday, Wednesday, and Friday of the following week. The fifth and final visit took place on a Wednesday two weeks after the initial visit. All procedures were approved by and performed in accordance with the guidelines and regulations of the Zurich Cantonal Ethics commission.

Behavioral task

Participants were asked to eat a small meal three hours before their appointment and consume nothing but water in the meantime, in order to be hungry and thereby increase the value of the foods during the experiment. The task involved two sequential phases of food rating and food choice, which participants performed in all 5 sessions. Before each phase, subjects performed a short training session to become familiar with the task. In the food rating phase, participants judged how tasty or healthy each food item is in two different blocks. The ratings were done on continuous scale with anchors of \(-\,5\) and \(+\,5\) at each end. The order of taste and health rating blocks were counterbalanced, and the order of images within each block was randomized. In the first and last sessions (1 and 5), the ratings were done over all 180 food images, but in the middle sessions (2–4), the ratings were done for a subset of 60 food images whose average ratings in previous studies spanned the total ranges for taste and healthiness. We reduced the size of the rating set in order to save time in sessions 2:4 and maximize participant retention. We used the healthiness and taste ratings provided in first session when analyzing the choice data in all 5 sessions.

In the food choice phase, subjects had to choose which of two food items they would eat at the end of the experiment. The food pairs were constructed so that choosing the healthier item often required forgoing the subjectively tastier one, and we refer to these cases as challenge trials. The food pairings of all 5 sessions for each participant were constructed based on the ratings of his or her first session. One of the participant’s decisions were randomly selected and implemented after the task. There was a 30-minute waiting period following the end of the session during which participants ate their selected food item. This 30-minute waiting period made the decisions more meaningful because participants did not have access to other food sources at that time, and they had not eaten for approximately 4.5 hours at that point.

Within the choice phase, there were two types of conditions that differed in the attention cues that are provided for the participants. In the health-cue condition, subjects were cued to consider the healthiness of the foods while making decisions. In this condition they were supposed to choose the healthier of the foods as often as they could, while keeping in mind that they would potentially have to eat the food they choose. In the natural-cue condition, subjects were cued to make decisions naturally based on whatever freely came to their minds. The food choice task consisted of 3 runs, with each run consisting of 5 or 6 blocks, 210 trials in total. The order of condition blocks was pseudo-randomized across subjects. Each subject faced 9 health-cue condition blocks (\(110.6 \pm 5.2\) trials) and 8 natural-cue condition blocks (\(99.3 \pm 5.2\) trials).

Choices were presented on the screen for 3 s and there was a jittered inter-trial interval was of 2–6 s. After the task was completed, one trial randomly was selected to be realized for each participant according to his or her choices. The participants stayed in the laboratory for 30 min to eat their food reward.

Computational tools

Logistic regression for choice outcome

We ran a Bayesian hierarchical logistic regression analysis with the brms package44,45 in R46, to estimate the participants choice. We used the uninformative priors default in the brms package. We estimated the probability of choosing the healthier food option, as a function of the difference in taste ratings (td) and health ratings (hd) between the healthier and less-healthy options, condition (natural-, and health-cued conditions), and session (5 sessions). The taste difference (td) is calculated by the taste of the more healthy option minus the taste of the less healthy option. The health difference (hd) is also computed in the same way. The model formula in brms syntax was:

$$\begin{aligned}&choice(healtier) \sim 1 + (hd + td) * condition * session \nonumber \\&\quad +(1 + (hd + td) * condition * session | subject) \end{aligned}$$
(1)

Note that asterisks indicate interaction terms and thus the model included all main effects and interactions between regressors except for interactions between taste and health differences. The first line lists the regressors and interactions that were estimated at the group or population level, and the final portion on the second lined inside parentheses shows those that were estimated at the subject level. In this case, all terms were estimated at both levels.

Linear regression for response time

We ran another Bayesian hierarchical regression analysis with the brms package in R, to estimate the response times (logarithm) as a function of difference in taste ratings and health ratings between two competing options, condition (natural-, and health-cued conditions), session (5 sessions), challenge type (when there is a conflict between taste and health and they have similar ratings). The health and taste ratings were z-scored, and all other variables were categorical. We used the uninformative priors default in the brms package. The model in brms syntax was:

$$\begin{aligned}&log(RT) \sim 1 + (hd + td) * condition * session * challenge \nonumber \\&\quad +(1 + (hd + td) * condition * session * challenge | subject) \end{aligned}$$
(2)

Note that asterisks indicate interaction terms and thus the model included all main effects and interactions between regressors except for interactions between taste and health differences. The first line lists the regressors and interactions that were estimated at the group or population level, and the final portion on the second lined inside parentheses shows those that were estimated at the subject level. In this case, all terms were estimated at both levels.

Sequential sampling model

We used the relative-starting-time DDM (rstDDM) model to estimate the response times and choice outcomes of the participants. This model was previously tested and validated for binary food choices36. This analyses have been implemented in R46. In this work, we used the hierarchical Bayesian fitting procedure described here: https://github.com/galombardi/method_HtSSM_aDDM. We fit each session and condition separately. The following seven free parameters were estimated in each fitting: stochastic component of evidence accumulation (noise), starting point bias (\(SP \text { } bias\)), non-decision time (NDT), relative start time for health (RST), weight of taste attribute on drift rate (\(\omega _{taste}\)), weight of health attribute on drift rate (\(\omega _{health}\)), and intercept for drift rate (\(Drift \text { } bias\)). The decision thresholds were fixed to [− 1, 1]. Note that the target variable in this analysis is the choosing left versus right option.

Test-retest reliability(ICC)

To determine the test-retest reliability of the task across the five sessions, we computed intra-class correlation coefficients (ICC). We computed ICCs for three different task dimensions including frequency of healthier choices, and DDM parameters (focusing on drift bias, health/taste weight, and RST), and participants’ ratings (health and taste). All of the variables were continuous, therefore we used the icc function in the irr library in R47 with the following configuration: \(icc(data, model = ''twoway'',type = ''agreement'',unit = ''single'', r0 = 0,conf.level = 0.95)\). We follow Shrout and Fleiss39 and interpret the ICC values according to the following convention: Poor \((< 0.40)\), Fair \((0.4-0.6)\), Good \((0.6-0.75)\), and Excellent \((0.75-1.0)\).