Introduction

Rapid recognition and efficient collection of food have been pivotal skills in the successful evolutionary struggle for survival of species (Kivell et al. 2016). However, in modern environments with omnipresent availability of attractive and palatable high-calorie food, this preferential and “hard-wired” handling of food now represents a critical new challenge representing a key component in the steady rise of overweight and obesity (Spence et al. 2016). Despite widespread knowledge about the detrimental effects of excessive eating and obesity, compelling food appears capable of overcoming the rational and deliberate decision-making. A key mechanism in this connection seems to be the subjectively high value of food (Hardman et al. 2020).

Dual-system models of eating behaviour propose that an impulsive system is modulated by a reflective cognitive control system which, in the best case, supports adaptive behaviour and healthy food choice (Friese et al. 2011). Cognitive control is associated with the dorsolateral prefrontal cortex (dlPFC) (Cole and Schneider 2007; Egner and Hirsch 2005; Fales et al. 2008; MacDonald et al. 2000). The dlPFC plays a crucial role in situations with conflicts where decisions have to be made or activated representations in the working memory have to be updated (Badre and Wagner 2004). Especially the right dlPFC is involved in tasks in which response inhibition is needed to overcome impulsive prepotent actions (Blasi et al. 2006; Figner et al. 2010; Garavan et al. 2002; Knoch and Fehr 2007; Simmonds et al. 2008). Furthermore, the dlPFC is responsible for executing goal-directed behaviour based on integrating neural information of other cortical areas. As part of a specific food-valuation system, hedonic values are pre-processed in the orbitofrontal cortex (OFC) and transmitted to the dlPFC which initiates specific behaviour (Camus et al. 2009; Petrides and Pandya 1999). Already the mere presence of food stimuli can lead to an activation of the orbitofrontal cortex (Killgore et al. 2003; Morris and Dolan 2001). Attentional processes and processes of cognitive control are physiologically challenged in healthy populations if highly rewarding stimuli like food are involved (Chami et al. 2019). In different psychological paradigms, it could be shown that food can lead to either slowed (Janssen et al. 2017; Johansson et al. 2005; Nijs et al. 2010a) or speeded reactions (Castellanos et al. 2009; Hou et al. 2011; Werthmann et al. 2015), depending on the task demands and paradigms.

More complex appetitive behaviour like grasping movements can be investigated in controlled environments by means of virtual reality (VR) in combination with motion tracking. Food stimuli seem to play a specific role in grasping: Schroeder et al. (2016) reported that 3D objects of food were collected faster than ball objects in stimulus-irrelevant colour-cued grasping, but not warding. Relatively fast, but simple approach movements to various stimuli can be measured on touchscreens (Meule et al. 2019). It is quite possible that early stages of grasping behaviour reveal different behavioural patterns and that later stages are calling for a combined approach of touch- and VR-based methodologies (Gladwin et al. 2014). A systematic examination of differential and concordant effects of the medium on food valuation and behavioural control in the manual interaction with food stimuli has not been done yet and this study should provide further insights in the impact of the methodologies concerning modelling food-related differences in valuation and behavioural control.

To follow-up on the study of Schroeder et al. (2016), we established a more naturalistic scenario to investigate neurobehavioural manual movement initiation and execution based on recognition and decision processes. To this end, we implemented a binary-choice–forced-choice paradigm in a VR- and in a 2D-touchscreen setup to investigate differences in the interaction with food stimuli in natural unimanual hand movements and their neurophysiological correlates. We expect to evoke a situation where on the one hand cognitive control is challenged by presenting distractor and target stimuli simultaneously next to each other and on the other hand the food-valuation network is activated during presentation of food items. If the food-valuation network is more involved in the processing of the subjectively high value of food, we hypothesize a faster movement initiation towards food objects (Castellanos et al. 2009; Hou et al. 2011; Nijs et al. 2008, 2010b; Werthmann et al. 2013). If the cognitive control network is more involved, this should result in slower handling of food objects to prevent impulsive choices. Since the processing of alluring food stimuli likely involves more resources of the food-valuation network as well as cognitive control, it should enhance activity in the dlPFC. Accordingly, the amount of this neural activity should correlate with the interindividual differences in manual grasping movement of food objects.

Methods

Participants

Healthy, right-handed and non-obese participants were recruited through announcements and mails to the distributor list of a German university. In total 33, volunteers were recruited. From those, two participants were excluded due to technical problems in performing the task (bad hand tracking); one participant was excluded due to mental comorbidities. In total, 30 healthy individuals (23 women, Mage = 22.30, SDage = 4.61, NBMI>25 = 8) participated in the experiment. Exclusion criteria were: left handedness, current dieting, neurological and mental diseases according to self-report, vegetarian and vegan diet, current or lifetime eating disorders according to clinical interview as well as BMI above 30. For their participation, the participants received either 8 €/h or course credits. The study was approved by the ethics committee of the Medical Faculty Tübingen (829/2018BO2) and all participants gave informed consent.

Apparatus

Virtual reality (VR)

During the behavioural task, participants were seated in a comfortable chair and wore a head-mounted display (HMD) which allowed for continuous tracking of head rotation (Oculus Rift CV1; Oculus VR, Inc., Menlo Park, USA). The HMD consists of two screens, both with a resolution of 1080 × 1200 pixels. The inter-pupillary distance was adjusted for each participant individually. A near-infrared sensor (Leap Motion Inc., San Francisco, USA) tracked the trajectories of the participants’ hand. These trajectories were streamed in real time into the stereoscopic display so the participants could interact with virtual stimuli through actual movements of their hand, comparable to our previous setups (Lohmann et al. 2018; Schroeder et al. 2016). Stereoscopic presentation was controlled by Unity 3D (5.6.2f1) with a bundled version of OVRPlugin. The leap motion device was positioned on a small table in front of participants and allowed for object interactions with the dominant hand in an area of approximately 1600 cm2 (effectively covering most of the grasping range of the participants). The 3D-stimuli originated from the asset-store of Unity (Unity Technologies, San Francisco, USA) or Blender-models (Blend Swap, LLC). To realize matching of shape and colour of the stimuli, the objects got rescaled and recoloured. The stimuli set consists of the variations originating from the subcategories (balls, food, office objects). Ledoux et al. (2013) could show that food stimuli in the VR are comparable to pictures of food and real-life food concerning triggering food craving. In total, 48 different stimuli were used. Stimuli were rated by the participants concerning valence, arousal, grasp to urge, aesthetics, subjective estimated size and comfort of grasping on a continuous scale ranging from 0 to 100. Overall, stimuli were comparable regarding several practical dimensions except consumption value, see Appendix A.

2D touchscreen

To operate on the 23-inch 2D touchscreen (iiyama ProLite T2336MSC) with a resolution of 1920 × 1080 pixels, the participants wore touchscreen gloves. The start position of the hand was marked by a hand symbol on the screen. The location of the hand symbol was in the bottom-mid of the touchscreen that was positioned in a landscape orientation. The distractor and target stimuli were presented in the top left and top right corner of the horizontally oriented display. The resolution of the stimuli was 150 × 150 pixels. Most of the stimuli originated from a public picture data base (Blechert et al. 2014), which were used in a previous experimental setup (Meule et al. 2019). To realize matched photorealistic stimuli for the VR stimuli, the rest of the photorealistic stimuli were downloaded from the internet. Ledoux et al. (2013) could show that food-pictures are comparable to real-life food concerning triggering food craving. Half of the stimuli consisted of screenshots of the VR stimuli, the other half of photorealistic stimuli. Stimuli were rated by the participants concerning valence, arousal, grasp to urge, aesthetics, subjective estimated size and comfort of grasping on a continuous scale ranging from 0 to 100. Overall, stimuli were comparable regarding several practical dimensions except consumption value, see Appendix B.

Functional near-infrared spectroscopy (fNIRS)

An ETG-4000 Continuous Wave Optical Topography System (Hitachi Medical Co., Japan) was used to measure relative oxygenated (O2Hb) and deoxygenated (HHb) blood concentration as indicators for brain activity (Fallgatter et al. 2004). The sampling rate was 10 Hz. Two 3 × 3 probe-sets with 12 measurement channels each and an inter-optode distance of 30 mm were placed over the left and right prefrontal cortex after the HMD was mounted. According to the international 10–20-system, one probe-set was placed over F3 (channel #7) whereas the other probe-set was placed over F4 (channel #19; see Fig. 4) (Jasper 1958). Using this configuration, the NIRS channels were predominantly located over the left and right dorsolateral (Brodmann areas 9 and 46) and inferior frontal cortex (Brodmann areas 44 and 45), as extrapolated from reference points based on the Colin 27 template (Singh et al. 2005; Tsuzuki and Dan 2014; Tsuzuki et al. 2007).

Procedure

Participants were instructed to not eat at least 3 h before the experiment and were asked at the appointment if they actually didn’t eat. All participants declared conformity with this instruction. The whole experiment was conducted in a single session lasting approximately 2 h and typically took place between 11.00–13.30 and 15.30–19.30. After assessing demographic data, the absence of a current and lifetime eating disorder was examined by the section H of the SCID-I (Wittchen et al. 1997). Thereafter, participants were weighed on a scale. Height was determined by self-report. Food craving assessed by the Food Craving Questionnaire–State (Meule et al. 2012) before and after the task in the VR and can be looked up in Appendix C.

Afterwards, participants were equipped with the HMD and the behavioural task in the VR started. The task started with practice trials to familiarize the participant with operating the system. Then the fNIRS probe-set was mounted on the participants’ head above the HMD. To start a trial, the participants had to put the hand in a standardized position. The start position of the right hand was marked by seven red coloured spheres which turned green if the start position was right. Subsequently, a fixation cross was presented which had to be aligned with a crosshair of the HMD for a duration of one second. Due to this fixation the participants had a standardized start position for their head and their gaze. With a stimulus onset asynchrony of 400 ms, a target and a distractor stimulus were presented concurrently next to each other. The participants were instructed not to move their hand until the stimuli were presented, otherwise an error message was displayed. If the target stimulus was presented on the left table, the distractor stimulus was presented on the right table and vice versa. The positions of target and distractor stimuli were counterbalanced within a block. Each block consisted of 32 trials. The trial was finished, when the target stimulus was grasped and placed inside the box. If the participants placed the target stimulus outside of the box, grasped the wrong stimulus or took longer than four seconds for the grasping, an error message was displayed and the next trial started. The distance between the start position and the target stimulus was around 40 cm. The whole behavioural task in the VR consisted of six blocks. Across participants, block order was counterbalanced and each participant was randomly assigned to a block order. After each block, there was a short self-paced break. The whole task took about 15 min. An exemplary trial is shown in Fig. 1. Afterwards, the behavioural task was conducted at the 2D touchscreen without fNIRS.

Fig. 1
figure 1

An exemplary trial of the condition “food”. The target stimulus which had to be grasped was a chocolate cupcake with pink icing. After the initial hand pose matches with the standardized hand pose, a fixation cross had to be aligned with the crosshair of the HMD for one second. With a stimulus asynchrony of 400 ms, the target and distractor stimulus were presented on the left and right table

The participants were seated in front of the horizontal oriented touchscreen with an angle of approximately 15° relative to the tabletop. The 2D touchscreen task was analogous to the task in the VR with the identical block order as in the VR. At the start of each trial, the participant had to touch a hand icon on the display with five fingers. After 500 ms, a fixation cross was presented for 1 s and with a stimulus asynchrony of 400 ms, the target stimulus and distractor stimulus were presented concurrently on the left and right top corner of the display.

According to the block instruction which was given before each block, the participant had to collect the target stimuli and drag it into a 2D model of a box. Due to faster startup procedures of each trial and faster movements in general, the whole task at the touchscreen took only around 10 min. An exemplary trial is shown in Fig. 2.

Fig. 2
figure 2

An exemplary trial of the condition “food”. The target stimulus which had to be grasped was a chocolate cupcake. After the hand icon has been touched with five fingers for 500 ms, a fixation cross was presented for one second. With a stimulus asynchrony of 400 ms the target and distractor stimulus were presented on the left and right top corner of the touchscreen

In both the VR task and the 2D-touchscreen task, each of the six blocks consisted of 32 trials. In two blocks each there were instructions to grasp different stimulus categories as fast and as correct as possible: food, balls or office tools. In each block, only two stimulus categories were presented. The order of the instruction was counterbalanced across all participants. Each stimulus category was twice selected as target and twice as distractor stimulus and paired with each other counterbalanced. Each variation of a stimulus category was presented randomized and balanced on the left and right table. Each pairing of the stimulus subcategories was realized.

Data analysis and statistics

Object rating analysis

All statistical inferences are conducted on a significance level of 5%. To estimate significance of valence of the different object categories (food, office tools, balls) paired t tests are conducted. Exploratory, to investigate the correlation between subjective food valence rating and reaction times, Pearson’s correlation tests were conducted.

Behavioural data analysis

All statistical inferences are conducted on a significance level of 5%. Incorrect trials were excluded from all analyses (2.52%). Reaction times above 2000 ms and below 200 ms for the movement onset as well as reaction times above 4000 ms for the collection time on the touchscreen and in the VR were considered premature/incorrect. Movement onset is defined as the time between the presentation of the objects and the movement onset. Collection time is the time span after the participant’s hand has left the start position until the participant’s hand with the critical target has reached the position above the collection box where the target gets dragged into. Further, values deviating more than 2.5 SD, from individual cell means were considered outlier responses. 2.78 percent of the trials in the VR and 2.98 percent of the trials at the touchscreen were excluded. Only reaction times were analyzed as the block design is expected to affect reaction times rather than error rates (Zeligman and Zivotofsky 2017). To account for individual differences in motor grasping of grasp-affordant objects, we decided to standardize the reaction times for the food and office objects in relation to the ball objects. As grasping ball objects resembles exclusively motor actions which have to be executed fast and in an automatic manner, this seems to be a valid baseline correction. Furthermore, the ball objects are less complex than the other objects and demand less cognitive functions like planning. Each trial was subtracted by the individual’s mean reaction time for ball objects. To measure effects of the different target stimuli on the manual grasping times in the VR compared to the touchscreen, the linear mixed model approach was used to account for individual differences in grasping food stimuli. All linear mixed models were calculated by the lme4-package of R (Bates et al. 2014). The linear mixed effect model approach contains a random effect for each subject which comprises the interindividual differences to the manipulated fixed effects, whereas the fixed effects are the averaged prediction of the fixed effects on the reaction times across all participants. To estimate the significance of each fixed effect log-likelihood tests between a linear mixed model with the fixed effect and a linear mixed model without the fixed effect were conducted. Two fixed effects were tested within the linear mixed model: the medium (VR vs. Touchscreen) and the category of the target stimuli (Food vs. Balls). Post hoc tests to test the different contrasts within a fixed effect of the linear mixed model, the lsmeans-package of R was used. This is based on the method of the least-squares means and post hoc tests are adjusted by Tukey method (Lenth 2016). To estimate the degrees of freedom of the post hoc tests, the Satterthwaite formula for degrees of freedom was used. Effect sizes for fixed effects were estimated by f2 which can also be used in mixed linear models (Aiken et al. 1991; Lorah 2018).

An individual bias separately for movement onset and collection time for each participant was calculated by the difference of the ball-standardized reaction times of food objects and the ball-standardized reaction times of office tools:

(RTOffice − RTBall) − (RTFood − RTBall). Therefore, a higher value indicates a more prominent shift towards food.

Further exploratory analysis with linear mixed models was carried out to investigate the impact of the valence differences of the two different objects that the participants faced during each trial on the reaction times towards food. For this, reaction times were aggregated across the participants. The difference score served as a fixed effect and a random intercept on each target stimulus was modelled.

fNIRS data analysis

Concentration changes of oxygenated (O2Hb) and deoxygenated haemoglobin (HHb) concentration were used for the fNIRS analysis, which was conducted using customized Matlab scripts (Matlab 2017a; The MathWorks, Inc., Natick, MA, USA). First, missing channels were interpolated before motion-based artefacts were corrected by the temporal derivative distribution repair (TDDR) method (Fishburn et al. 2019). To minimize further artefacts of non-neural causes, signal improvement relying on the assumption of a negative correlation between oxygenated and deoxygenated haemoglobin was conducted (Cui et al. 2010) during which the two signals were combined to one “true oxy signal” which was then further analysed. A bandpass filter of 0.01 to 0.10 Hz was applied and around 3.60 percent of the channels with remaining artefacts were interpolated manually. A Gaussian kernel filter with a standard deviation of σ = 40 was then used to remove global physiological artefacts (e.g., related to respiration) (Zhang et al. 2016). Finally, the data were z-standardized and the mean z-transformed amplitude ranging from 10 to 40 s following the start of the block was individually exported for further statistical analysis [with a pre-task baseline of 5 s resting and separately for the individual average of both “food blocks” (food as target with distractor of balls or office tools), both “ball blocks” (ball as target with distractor of food or office tools) and both “office blocks” (office tools as target with distractor of food or ball)].

To be in line with previous statistical analysis, we decided to standardize the fNIRS BOLD response for the food and office objects in relation to the ball objects. The analysis focusses on the ball-object-corrected fNIRS BOLD signal between food and office objects. To measure effects of the different target stimuli on the fNIRS BOLD response, paired t tests between the conditions were performed and Bonferroni-corrected for multiple comparisons. Those paired t test were performed on the channels covering Brodmann area 9 (channel number (Ch) 8, 11, 13, 15, 16, 18, 20) and 46 (Ch 5, 9, 19, 21, 22) which both contribute to the dorsolateral prefrontal cortex (dlPFC), as well as Brodmann area 44 (Ch 6, 23) and 45 (Ch 2, 4, 7, 24) which both contribute to the inferior frontal gyrus (IFG).

Correlations between the fNIRS BOLD response and the individual biases of the manual actions were also calculated by Pearson’s correlation test. An fNIRS BOLD response bias was calculated by the following formula:

(fNIRS-BOLDFood − fNIRS-BOLDBall) − (fNIRS-BOLDOffice − fNIRS-BOLDBall). Therefore, a higher value indicates a higher shift towards food.

The individual behaviour biases for each participant were correlated with the individual fNIRS-BOLD response bias in the VR solely as there was no concurrent fNIRS measure at the touchscreen. Target areas of the fNIRS BOLD response were Brodmann area 9, 46, 44 and 45.

Results

Object ratings

In the VR, a significantly higher mean score concerning valence is reported for food items (MVR = 66.92, SDVR = 14.68) than for office tools (MVR = 51.29, SDVR = 12.65), t(29) = 4.62, p < 0.001, and for balls (MVR = 58.06, SDVR = 10.97). At the touchscreen, the mean valence of food items (MTS = 71.48, SDTS = 13.74) is significantly higher than for office tools (MTS = 51.15, SDTS = 12.87), t(29) = 6.67, p < 0.001, and significantly higher than for balls (MTS = 58.75, SDTS = 13.66), t(29) = 4.78, p < 0.001.

Effects of food on different stages of manual movement in VR and touchscreen

Movement onset

Including the category of target stimuli as a fixed effect in the mixed model leads to a significantly better model than the random intercept-only model, χ2(1) = 106.87, p < 0.001, R2 = 0.09. The movement onset for the food objects as a target was significantly faster than for the office objects (M = 19.04 ms), t = 10.38, SE = 1.84, p < 0.001, f2 = 0.016.

Including the medium as fixed effects in the mixed model leads to a significantly better model than the random intercept-only model, χ2(1) = 355.49, p < 0.001, R2 = 0.12. The movement onset for the objects in the VR is significantly slower than at the touchscreen (M = 34.43 ms), t = 19.18, SE = 1.80, p < 0.001, f2 = 0.050.

Including an interaction between the two fixed effects does not lead to a significantly better model, χ2(1) = 1.49, p = 0.222, R2 = 0.14. This means the different levels of the fixed effects do not interact with each other. In Fig. 3a the ball-object-corrected reaction times of the movement onset are depicted. For the raw reaction times of the movement onset, see Appendix D.

Fig. 3
figure 3

Movement onset (a) and collection time (b) in relation to the movement onset reaction times of the ball objects. The bars represent the standard error of fixed effect estimates. d-RT were calculated by subtracting the reaction time of the target stimulus from the reaction time of the ball objects

Collection time

Including the category of target stimuli as fixed effects in the mixed model leads to a significantly better model than the random intercept-only model, χ2(1) = 30.58, p < 0.001,

R2 = 0.11. The same applies to the medium. This fixed effect in the mixed model leads to a significantly better model than the random intercept-only model, χ2(1) = 49.27, p < 0.001,

R2 = 0.11. Furthermore, those two fixed effects interact, χ2(1) = 15.11, p < 0.001, R2 = 0.12. Whereas the collection time of food and office objects does not differ significantly at the touchscreen (M = 9.31 ms), t = − 1.29, SE = 7.19, p = 0.566, the collection time of office objects is significantly faster than the collection time of food objects in the VR (M = 49.34 ms), t = − 6.70, SE = 7.36, p < 0.001, f2 = 0.002. That means, as soon as the participants have left their initial hand position, they collect office objects quicker than food objects, but only in the VR. In Fig. 3b the ball-object-corrected reaction times of the collection time are depicted. For the raw reaction times of the collection time, see Appendix D.

Object ratings and manual action in VR and at the touchscreen

In the VR, neither the movement onset time correlate with valence of food items (r = 0.18), t(14) = 0.67, p = 0.512, nor the collection time (r = 0.22), t(14) = 0.86, p = 0.405. At the touchscreen, neither the movement onset time correlates with valence of food items (r = − 0.11), t(14) = − 0.42, p = 0.680, nor the collection time (r = 0.30), t(14) = 1.16, p = 0.266.

For the movement onset in the VR, there is a significant impact of the difference of the valence between the two objects the participant faced during each trial on the movement onset (β = − 1.45, p < 0.001). For each rating point difference concerning subjective valence between the two-faced objects, the movement onset for the food object got 1.45 ms faster. For the collection time in the VR, the difference of the valence is not a significant predictor (β = 0.24, p < 0.816). At the touchscreen, the difference of the valence does not predict the movement onset (β = − 0.56, p = 0.056), nor the collection time (β = − 1.33, p = 0.146).

Functional near-infrared spectroscopy (fNIRS)

Effects of stimuli categories on fNIRS BOLD response in VR

A strong prefrontal activation in the ball-standardized food condition in contrast to the ball-standardized office condition can be seen (Fig. 4). Especially in the region of the right dlPFC (Ch 19), a significantly higher activation in the food condition can be observed, t(29) = 3.29, p = 0.047. All other comparisons of the two standardized condition in other brain regions are not significant. All paired t tests conducted on the channels covering dlPFC and IFG are listed in Appendix E.

Fig. 4
figure 4

Map for the functional near-infrared spectroscopy (fNIRS) data showing neural activity during the VR task. The numbers depicted resemble the corresponding channel numbers. The contrast between the ball-corrected food condition and the ball-corrected office condition is depicted and effect sizes are reported. Positive values indicate activation, negative values indicate deactivation

fNIRS BOLD response bias and manual movement in VR

A significant positive correlation (r = 0.37) between the fNIRS BOLD response bias of the right dlPFC and the movement onset bias is observed, t(28) = 2.10, p = 0.044. The stronger the fNIRS BOLD response to food objects in relation to office objects, the faster the decision to lift off the hand and to start approaching the food object compared to the office object. This correlation is depicted in Fig. 5. The correlation (r = 0.13) between the fNIRS BOLD response bias of the right dlPFC and the collection time bias is not significant, t(28) = 0.71, p = 0.486.

Fig. 5
figure 5

Correlation between individual movement onset bias and fNIRS BOLD response bias in the right dlPFC in the VR. Movement onset bias was calculated by subtracting the ball-standardized reaction times for food objects from the standardized reaction times for office tools. A positive value indicates faster reaction times to food stimuli compared to office stimuli. fNIRS BOLD response bias was calculated by subtracting the ball-standardized z-transformed fNIRS BOLD response for office tools from the ball-standardized z-transformed fNIRS BOLD response for food objects. A positive value indicates more neural activity for food compared to office tools

Discussion

The present study revealed attentional and behavioural processes during the handling of food objects by assessing recognition and reaction times in both a VR setup and on a touchscreen device while further assessing fNIRS based brain activity during VR. We found that manual interaction with food stimuli goes along with enhanced neural activity in the right dlPFC. In line with our hypothesis concerning the involvement of a food-valuation network, we found a faster movement onset towards food as compared to office tools in both interfaces (VR and touchscreen) accompanied by higher neural activity in the right dlPFC. In line with our hypothesis concerning cognitive control, after movement initiation, a slower handling of food compared to office tools was observed in the VR. Of note, this effect was absent on the touchscreen.

A pivotal finding of this study is the enhanced neural activity in the right dlPFC when food stimuli were the target for the grasping movement. This food-related activity in the right dlPFC is in line with the proposal of the right dlPFC as a part of a specific food-valuation system (Camus et al. 2009). This model assumes an interconnection of the dlPFC and the orbitofrontal cortex (OFC) (Petrides and Pandya 1999). Hedonic values of stimuli are processed in the OFC and transmitted to the dlPFC to execute specific goal-dependent behaviour (Camus et al. 2009). Even if the participants are not free to choose which object they want to collect, an activation of a specific food-valuation system in the right dlPFC can be assumed since an adjustment of behaviour is required in each trial according to the localization of the food object. Apparently, food captures more neural resources in, for instance, estimating stimuli specific values and establishing goal-dependent behaviour. Therefore, it is reasonable to assume an additional food-specific effort.

The speeded reaction times of the first stage of manual interaction, the initiation of movement are in line with previous research. This stage involves the recognition and selection of two different object categories. Research using the visual dot probe paradigm showed elevated attentional processes towards food stimuli when compared to control stimuli (Castellanos et al. 2009; Hou et al. 2011; Nijs et al. 2008, 2010b; Werthmann et al. 2013). This could be due to the significantly higher value and survival relevance of food compared to office tools. Even if we did not find a significant correlation between valence of food objects and the reaction times, we could find a significant impact of the differences concerning subjective valence of the two concurrently presented stimuli. A higher difference in subjective valence led to faster movement initiation, thus highlighting the role of value in attentional selection and movement initiation processes involving food. On a neuropsychological level, those elevated attentional processes are associated with the right dlPFC, a brain region which is mostly known for its role in response inhibition (Blasi et al. 2006; Mostofsky and Simmonds 2008). The neuropsychological correlates of the current study highlight a more prominent role of the right dlPFC in selecting different target stimuli rather than suppressing an automatic response: the higher the brain activity in the right dlPFC, the faster the manual movement initiation. This correlational finding qualifies the role of the dlPFC in regulating early goal-directed behaviour in the interaction with food objects (Cornier et al. 2010; Horstmann et al. 2011).

The second stage of manual interaction, the collection time, was slower in food than in office tools and therefore is in line with the hypothesis of a cognitive control network. In contrast to the previous study by Schroeder et al. (2016) reporting a speeded collection time of food, we implemented a new paradigm to assess more naturalistic behaviour. In the current study, the participants had to react to a goal-relevant feature of this task and explicitly discriminate and select one out of two concurrently presented stimulus categories. Accordingly, a more conscious and controlled handling of food could indicate the higher hedonic value of food and its associated need to suppress an impulsive behaviour. As there was no significant correlation between the collection time and the brain activity in the right dlPFC, the involvement of the right dlPFC as a cognitive control network was not as prominent as part of a food-valuation network. Alternatively, attentional biases can be influenced by both appetitive and aversive motivational processes (Field et al. 2016). The finding that differences in manual interaction with food were more prominent in the VR than at the touchscreen highlights the crucial role of response medium and the need for more methodological research. These differences in interaction with food stimuli provide information about its underlying mechanisms of food-valuation and cognitive control. It can be assumed that, particularly in the VR involving more complex and naturalistic behaviour and a higher level of immersion, a more careful and thus slower handling of food is a consequence of higher significance or personal importance of food objects. Consequently, VR seems more promising in possible future applications to modify specific behavioural biases, for example in eating disordered samples, whereas touchscreens potentially allow a wider and easier dissemination. While both media are able to reflect food-specific cognitive differences in the recognition-phase, VR seems to be more sensitive in modelling embodied cognition in the specific handling of food.

Conclusion

In sum, by means of a food-decision task in VR, we were able to document differences in interaction speed with food stimuli partially linked with higher activation in the right dlPFC. Faster movement initiation on the one side and slower handling of food on the other is consistent with the relatively higher appeal of food objects and a correspondingly more controlled interaction. These findings underline the significance of food valuation and cognitive control networks in manual interactions with food and the particular role of the right dorsolateral prefrontal cortex. Food-specific behaviour was more evident in the VR than at the two-dimensional touchscreen, which emphasizes the relevance of medium in modelling food-specific behavioural differences.

Investigating dynamics in a sample with disinhibited eating behaviour like obese subjects or patients with binge-eating-disorder could offer further insights in relevant behavioural characteristics and their neurophysiological signatures. These findings support the notion that targeted network stimulation and bias retraining may provide promising perspectives for an individualized modulation of disorders associated with an increased food bias. In this context, the use of immersive VR technology seems to be most promising for inducing behaviourally relevant effects.

Data availability statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.