Belief inference for hierarchical hidden states in spatial navigation

Katayama, Risa; Shiraki, Ryo; Ishii, Shin; Yoshida, Wako

doi:10.1038/s42003-024-06316-0

Belief inference for hierarchical hidden states in spatial navigation

Article
Open access
Published: 21 May 2024

Volume 7, article number 614, (2024)
Cite this article

Download PDF

You have full access to this open access article

Communications Biology

Belief inference for hierarchical hidden states in spatial navigation

Download PDF

Risa Katayama ORCID: orcid.org/0000-0002-7217-3096^1,2,
Ryo Shiraki¹,
Shin Ishii^1,3,4 &
…
Wako Yoshida^5,6

410 Accesses
1 Altmetric
Explore all metrics

Abstract

Uncertainty abounds in the real world, and in environments with multiple layers of unobservable hidden states, decision-making requires resolving uncertainties based on mutual inference. Focusing on a spatial navigation problem, we develop a Tiger maze task that involved simultaneously inferring the local hidden state and the global hidden state from probabilistically uncertain observation. We adopt a Bayesian computational approach by proposing a hierarchical inference model. Applying this to human task behaviour, alongside functional magnetic resonance brain imaging, allows us to separate the neural correlates associated with reinforcement and reassessment of belief in hidden states. The imaging results also suggest that different layers of uncertainty differentially involve the basal ganglia and dorsomedial prefrontal cortex, and that the regions responsible are organised along the rostral axis of these areas according to the type of inference and the level of abstraction of the hidden state, i.e. higher-order state inference involves more anterior parts.

Confidence modulates the decodability of scene prediction during partially-observable maze exploration in humans

Article Open access 19 April 2022

Bayesian decision theory and navigation

Article 24 November 2021

Planning and navigation as active inference

Article Open access 23 March 2018

Introduction

In everyday behaviour, the current state of an environment often remains hidden from direct observation and must be inferred through belief. These beliefs concerning hidden states inherently carry uncertainties stemming from two distinct sources. First, the hidden state may not be inferred accurately because only insufficient information is available or the information is inaccurate, i.e., due to observational uncertainty. The second source of uncertainty arises when the hidden state cannot be uniquely identified (i.e., two or more states look exactly the same, even with perfect observation). This uncertainty occurs due to the inherent structure of the environment itself, i.e., state uncertainty.

This complex scenario, involving the simultaneous handling of these different levels of hidden states, is often manifest in real-world spatial navigation challenges. For example, consider navigating the Gion Festival in Kyoto, which is known for its beauty and bustling crowds. The streets form a grid pattern resembling a Go board and successfully navigating through the festival requires one to identify your current location, whilst avoiding several floats, the centrepieces of the festival, where you can easily get stuck in the crowd. The floats are strategically positioned throughout the city, and a distinctive musical accompaniment known as the Gion Bayashi can be heard as you approach them. The direction from which this sound is heard serves as a vital clue for identifying one’s location. However, the auditory information contains uncertainty as the floats can move around. In such scenarios, both the perceived sound direction and one’s location in the city constitute hidden states that need to be estimated. This decision-making process, guided by the estimation of these hidden states, can be formulated as a partially observable Markov decision process (POMDP). In cases featuring a hierarchy of hidden states, this problem can be classified as a hierarchical POMDP. Despite the success of hierarchical POMDP models as machine learning methods and their widespread use in artificial intelligence for facilitating adaptive behaviour^1,2,3,4,5,6, our understanding of how the brain processes uncertain information to solve such challenges remains limited.

In this study, we computationally examined inference processes in the human brain and investigated how various brain regions contribute to the inference of multi-layered hidden states. In neuroscience, hierarchical structures have proven useful as neural and behavioural models for explaining complex perceptual learning^{7,8,9,10,11,12}, motor learning^13,14,15, and social decision-making^16,17. Neuroimaging investigations focused on prediction problems within hierarchical state spaces have revealed that predictions at different levels of hierarchy correlate with neural activity in distinct brain regions, particularly in perceptual¹² and reward-based decision-making¹⁸. We posit that this segmentation of neural activity based on hierarchy also extends to higher cognitive systems. Several brain imaging studies have examined spatial navigation tasks involving uncertainty^19,20,21 and have identified activity in the medial prefrontal cortex (mPFC) with respect to state inference¹⁹. The medial surface of the frontal region has been linked to beliefs about hidden states in studies on both non-human primates²² and humans^23,24,25, leading to the hypothesis that different areas of this region may be involved in inferring different levels of hidden states within a hierarchical environment.

To test this hypothesis, we designed a Tiger maze navigation task that combines two distinct yet representative types of POMDP problems: the classic Tiger problem with observational uncertainty and partially observable maze navigation involving state uncertainty. In the partially observable maze problem, an agent estimates the current location within a maze whose structure has already been learned from scenes that show only a limited area around the current location¹⁹. Within this maze, multiple locations offer the same visual scene, establishing a one-to-many relationship between observations and hidden states. As the agent navigates through the maze, the increasing history of observations aids in identifying the hidden state by progressively narrowing down the possible locations. Here, we introduce the probabilistic observations used in the Tiger problem in this maze task. In the Tiger problem, an agent faces two doors, one of which conceals a tiger (the tiger door). The agent has the choice of opening either door or choosing to listen at the door for the tiger’s roar (at a small cost), which can be observed probabilistically²⁶. In the Tiger maze navigation task, each grid has four doors, including a tiger door, with the visual scene being identical in each grid, and listening actions can be performed as many times as desired. In other words, the grid location can only be inferred by inferring the position of the tigers, which have learned in the training session (see the “Methods” section, specifically the Training task). This arrangement provides probabilistic observations that can be used to estimate which door the tiger is likely to be behind, while simultaneously assisting in estimating the grid location in the maze. Thus, the Tiger maze navigation problem constitutes a POMDP with two distinct types of hidden states. In this problem, the position of the tiger door can be estimated directly from the observations, whereas the grid location can only be indirectly inferred using both these observations and map information. If the grid location can be identified, the position of the tiger door is uniquely determined, highlighting the hierarchical structure of this problem, with the grid location representing the higher level. Here, while most maze navigation studies have assumed an environment with an observable state and an explicit goal state and regarded the navigation process as a decision-making problem^27,28,29 or the generation of a spatial predictive map^30,31, the Tiger maze navigation task is a problem-setting focused on hidden-state inference.

Results

Behavioural results

In this study, twenty healthy participants engaged in the Tiger maze navigation task using a pre-trained maze (Fig. 1b). The goal of the game is to move around the maze without getting eaten by the tiger, until a termination condition is reached, based on either surviving for a set number of trials or visiting a set number of grids. The success rate, that is, the proportion of games where the participants completed their exploration without opening the tiger door, was 81.3 ± 11.1% (equivalent to 19.5 ± 2.7 out of 24 games in total). Each trial of the game begins with an action phase in which participants choose whether to move through the maze or listen to the tiger’s roar, followed by a prediction phase in which they predict both the tiger door position and the grid location and report their confidence level for each (Fig. 1a). The mean prediction accuracy for the tiger door position and grid location was 80.6 ± 3.5% and 57.3 ± 8.6%, respectively, and both prediction accuracies were significantly higher than chance (Wilcoxon signed-rank test; tiger door, p = 8.9 × 10⁻⁵; grid, p = 8.9 × 10⁻⁵). The prediction accuracies were also higher when the participants reported a high confidence level than when they reported a low confidence level (Fig. 1c, Wilcoxon signed-rank test; tiger door, p = 8.9 × 10⁻⁵; grid, p = 8.9 × 10⁻⁵).

During the action phase of the task, participants observed the scene and chose either to move or listen. It was impossible to infer the hidden current state from the scene alone, as the scene that could be observed was the same in all states. Thus, to correctly answer the predictions in the subsequent prediction phase, it was necessary to observe the direction of a tiger roar through listening action and infer the position of the hidden current grid and the tiger door. The participants indeed selected to listen more often in the early stages of maze exploration, and the probability of selecting to listen was negatively correlated with the number of trials per game (Fig. 2a, r = −0.71, p = 3.5 × 10⁻³²). The proportion of listening actions also increased when the participants were less confident in their predictions (Fig. 2b, main effect for tiger door confidence, F(1,72) = 84.5, p = 9.1 × 10⁻¹⁴; main effect for grid confidence, F(1,72) = 57.2, p = 1.0 × 10⁻¹⁰; interaction, F(1,72) = 4.1, p = 4.6 × 10⁻²). Note that the significant decrease in the proportion of the Listen trials in the third trial is due to the fact that the participants typically selected their first move action in this specific trial index (64.2 ± 22.2% of all games).

In the prediction phase, participants were required to predict both the tiger door position and the grid location in the maze. The tiger door inference is necessary to determine which doors can be safely opened in the current grid and can be estimated by listening to tiger roars. It can also be uniquely determined from the maze structure if the current grid location is known. However, inference of the grid is necessary for efficient exploration and can be estimated by gradually narrowing down the possible locations in the maze based on both the previous movement history obtained through multiple moving actions and the tiger door position predicted by listening actions. The two estimation processes are therefore bidirectional rather than independent; however, while the tiger door can be estimated directly from the observation obtained by choosing to listen, the grid can only be estimated indirectly from memory and movement history, which makes it a more challenging problem.

As the Listen and Move trial indices increased, confidence in predicting both the tiger door and the grid location increased, but with different trends. For the Listen trials, confidence in predicting the tiger door increased rapidly even early in the task (Fig. 2c left), and the correlation between the mean confidence level and the trial index was weak (r = 0.35 ± 0.43), whereas for the grid location prediction, confidence increased gradually, i.e., the correlation was strong (r = 0.77 ± 0.27); there was a significant difference in correlation (p = 1.9 × 10⁻³). On the other hand, when the Move was chosen, both the tiger door and grid predictions had a strong correlation between their confidence level and the trial index (tiger door: r = 0.81 ± 0.09, grid: r = 0.83 ± 0.09), with no significant difference in correlation (p = 0.14). The repeated two-way ANOVAs also showed significant effects on the tiger door confidence for both the trial index (F(5,14) = 25.0, p = 1.6 × 10⁻⁶) and the action type (F(1,11) = 72.1, p = 3.7 × 10^–6), whereas for the grid confidence the effect of the trial index only was significant (F(5,14) = 40.7, p = 7.5 × 10^–8; effect of the action type, F(1,11) = 3.0, p = 0.11). The prediction accuracies exhibited similar temporal profiles as the confidence levels for these two types of predictions (Fig. 2d; repeated two-way ANOVA, for the tiger door, effect of the trial index, F(5,14) = 24.9, p = 1.7 × 10⁻⁶; effect of the action type, F(1,11) = 47.7, p = 2.6 × 10⁻⁵; for the grid, effect of the trial index, F(5,14) = 36.7, p = 1.5 × 10⁻⁷; effect of the action type, F(1,11) = 3.0, p = 0.11). The difference in the temporal profiles of the confidence level evolution suggests that these two predictions involve different processes depending on the selected actions. Furthermore, the state transition frequency matrices show that the state transition from the low tiger door, low grid to the high tiger door, high grid confidence state was often mediated by the high tiger door, low grid confidence state (Fig. 2e–g). These results support the notion that tiger door inference was followed by the grid inference; in other words, the grid was hierarchically inferred after the tiger door inference in the Listen trials. The same statistical results were obtained when the data from the behavioural (outside of the scanner) and imaging experiments were analysed separately (Supplementary Fig. S1).

We implemented a hierarchical inference model in the brain that reflects behavioural results, in which the tiger door position and grid location in the maze are hierarchically inferred as probability distributions. This model assumed that when the participants performed the Listen action, they first inferred the tiger door position and then inferred the grid location. The tiger door was inferred in a Bayesian fashion, i.e., it was updated as the product of the prior information, which is a prediction made in the previous trial, and the newly observed information (the position of the roaring door), weighted by an exponential parameter, delta. The estimated value of this delta parameter (1.8 ± 0.74; Table 1) was greater than 1, indicating that the observed information was given more weight. The grid location was then also updated using Bayesian estimation, with the likelihood information being the extracted grid positions, such that the probability of satisfying the condition of matching the predicted tiger door position was weighted by a parameter beta. Here, it was also assumed that if the observed tiger door position disagreed with the prediction, the participant would re-estimate the grid using the current tiger door prediction (re-estimation), but could continue to update the grid position inferred on the previous trial (which did not match the observation) with probability epsilon. The estimated beta value, which indicated the accuracy of grid extraction from the observations, was high (0.97 ± 0.045), while the epsilon value, which indicates the probability of dragging incorrect estimate was low (0.14 ± 0.29); these results suggest that the participants tended to infer the hidden state accurately from the observations. On the other hand, if they moved to an adjacent grid, the next grid location was inferred from the structure of the maze, which in turn predicted the tiger door position (Fig. 3a; for further details, please refer to Supplementary Fig. 2 and the “Methods” section, specifically the Behavioural model). In both the Listen and Move trials, the grid location had to be estimated from the memory of the maze structure, but to account for participants’ imperfect memory, an error rate gamma was introduced: the mean of estimated gamma was very small (0.074 ± 0.057) and the individual estimates were negatively correlated with the accuracy of the grid location prediction (r = −0.55, p = 1.3 × 10⁻²). We also examined two alternative models: a top-down model that inferred only the grid location (the tiger door position is uniquely derived from the maze structure based on the grid location inference) and a parallel inference model in which the tiger door and grid location are inferred independently (see the “Methods” section, specifically the Alternative models and Supplementary Fig. 2). Random-effects BIC analysis demonstrated that the hierarchical inference model was the best model to explain the participants’ behaviour, even when considering the additional complexity of the model (Table 1 and Supplementary Fig. 3). Our hierarchical inference model accurately reproduced the participants’ decisions for both predicting the tiger door position (87.2 ± 2.3%) and predicting the grid location (60.9 ± 6.2%). The reproducibility was significantly better than chance, even in trials in which the participants’ predictions were incorrect (Wilcoxon signed-rank test; tiger door, 56.5 ± 6.9%, p = 8.9 × 10⁻⁵; grid, 33.2 ± 5.4%, p = 8.9 × 10⁻⁵).

Table 1 Bayesian model comparison results

Full size table

**Fig. 3: Hierarchical Inference model and model-based behavioural analysis results.**

The entropies of the tiger door position and grid location inferences estimated from the hierarchical inference model corresponded well with the confidence levels reported by the participants for their predictions; that is, the entropy was considerably higher when the confidence levels were high (Fig. 3b, Wilcoxon signed-rank test; tiger door, p = 8.9 × 10⁻⁵; grid, p = 8.9 × 10⁻⁵). In addition, the hierarchical inference model-based entropies were substantially higher before the participants chose to listen in the action phase than before they chose to move (Fig. 3c, Wilcoxon signed-rank test; tiger door, p = 8.9 × 10⁻⁵; grid, p = 8.9 × 10^–5). The entropies of the tiger door and grid inference displayed similar behaviour to the confidence level when plotted over trials. Specifically, in the Listen trials, only the entropy of the tiger door decreased from the beginning, whereas in the Move trials, both entropies decreased gradually as the number of trials increased (Fig. 3d). The correlation between these two types of entropies was much lower in the Listen trials than in the Move trials (Fig. 3e, p = 8.9 × 10⁻⁵). Thus, the model-based entropy derived from the probability distributions estimated from the participants’ behaviours alone exhibited properties similar to the metacognitive judgements of the participants’ subjective confidence. This suggests that the hierarchical inference model can successfully replicate information processing in the participants’ brains.

Brain activity involved in processing the action feedback

The behavioural analysis results suggested that different information processing steps occurred after executing listening to and moving actions during the action phase. This is because the explicit feedback for inference, i.e., the tiger roar, was given only after choosing to listen. In a subtraction analysis comparing brain activity during the action feedback period of the Listen and Move trials, it was observed that the brain regions displaying significantly higher activation during the Listen trials were widely distributed across the cerebrum, including the right rostrolateral prefrontal cortex (rlPFC) and the parahippocampal gyrus (PHG) (Fig. 4a, Supplementary Fig. 4, and Supplementary Table 1). In contrast, no cluster exhibited significantly higher activity during the Move trials than in the Listen trials.

**Fig. 4: Brain activation involved in processing the action feedback.**

According to our hierarchical inference model, there are two types of information processing in the brain during the Listen trials: one is an updating mode, which makes previous prediction more confident if the position of the tiger roar matches the tiger door position predicted based on the history. The other mode is a re-estimation mode, which re-estimates the grid location prediction if the tiger roar was heard from a different position than predicted. Upon comparing brain activity during the action feedback period, we found that neural activity in the dorsomedial prefrontal cortex (dmPFC) and bilateral anterior insula increased in the re-estimation trials compared to the updating trials (Fig. 4b). Activities involved in updating were observed in the supplementary motor area (SMA) and left fusiform gyrus (Fig. 4c, Supplementary Table 2).

Neural correlates involved in the uncertainties during the hierarchical inference

To identify the neural activities involved in the two types of uncertainty of the inference, we first performed a general linear analysis in which the two prediction delays in each trial were modulated by the corresponding prediction confidence levels, which was the participants’ introspective evaluation of the uncertainties: 0.5 (high) or −0.5 (low). It was found that some cortical areas were negatively correlated with confidence levels (Fig. 5a, b, cool colour scale, Supplementary Table 3). We also constructed a general linear model (GLM) using the two types of posterior entropies estimated using our hierarchical inference model. The brain regions that exhibited activity correlated with these posterior entropies included not only the regions of interest associated with the corresponding subjective confidence levels, but also extended to other areas (Szymkiewicz-Simpson coefficient for the tiger door confidence and entropy, SS = 0.80; for the grid confidence and entropy, SS = 0.85; see also Fig. 5a, b, warm colour scale, and Supplementary Table 4). For the tiger door inference, the bilateral insula, thalamus, and putamen exhibited activity that was correlated with increasing entropy (Fig. 5a). In contrast, BOLD activity in the bilateral thalamus, PHG, and putamen was positively correlated with grid inference entropy (Fig. 5b). Both the dmPFC and putamen displayed increased activity that correlated with both tiger door position entropy and grid entropy; however, the localisation was different, that is, the activity associated with tiger door entropy was situated more posteriorly in both regions, while the activity associated with grid entropy was primarily observed in the anterior part (Fig. 5c).

**Fig. 5: Neural correlates involved in uncertainties in hierarchical inference.**

Discussion

In this study, we designed a Tiger maze navigation task to test how the brain enables us to estimate our own location and that of a tiger within a maze based on probabilistic information. We proposed a novel hierarchical model as the generative model of hidden-state navigation, in which our grid location in the maze and the tiger door position interact hierarchically. Our model fits the participant’s prediction performance better than alternative parallel or top-down inference models. The results of the behavioural data analysis revealed that the uncertainty associated with inferring higher level hierarchical hidden state, grid location, and lower-level state, tiger door position, was resolved over different time scales, and these processes were effectively replicated by our hierarchical inference model (Fig. 3, Supplementary Fig. 3).

In our proposed model, the order of inference depends on the type of action performed before it; i.e., after the listening action (where participants can gather probabilistic information about the tiger door position), the tiger door position is inferred first, with the grid location later inferred on that basis. Conversely, after the moving action (where participants transition between grid locations), the inference is performed in the opposite direction. Note that most previous navigation researches have formulated the human behaviours with the framework of reinforcement learning and goal-directed planning^27,28,30; in our study, however, the navigation behaviours were not modelled primarily due to the unknown reward function to participants and lacking a particular goal state. In most hierarchical models proposed to date, low-level inference has been unidirectionally controlled by high-level inference^10,12,17,32; however, such a model failed to adequately explain participants’ behaviour in the task (Table 1, Supplementary Fig. 3). The interactive information flows between the hierarchical hidden states, as introduced in our model, have been proposed in machine learning studies, particularly in the context of hierarchical neural networks featuring bidirectional information flows^28,29,30, suggesting its potential significance for future endeavours in replicating and understanding complex human information processing. Our hierarchical inference model has the potential to be applied to cognitive problems involving hierarchies, such as transfer learning and social interactive decision-making.

Most previous studies have dealt with maze environments aimed at reaching a goal state, and have formulated navigation behaviour within the framework of reinforcement learning and goal-directed planning^27,28,29. However, our Tiger maze navigation task is a problem setting that focuses on hidden-state inference and the goal state is unknown, therefore the participants’ navigation behaviour was used as an observed variable rather than being modelled directly by a generative model. Although not introduced in our hierarchical inference model to avoid complication, it would be possible to model the navigation behaviour based on the goal of exploring as many unvisited states as possible to avoid opening the tiger door. Specifically, behaviour in our task could be modelled as a hierarchy of two stages: the decision to listen or move and, if moving, the direction in which to move. The former could be formulated as an approach-avoidance conflict model^33,34, where the conflict is between the avoidance behaviour of avoiding the tiger door and the exploration behaviour of reaching the goal. The choice of moving direction could then be determined by the objective function of maximising the explored grids in the maze²⁰.

During the action phase, several brain regions across the cerebral cortex, including the rlPFC and PHG, exhibited significantly greater activation during the Listen trials than in the Move trials (Fig. 4a). Conversely, no regions displayed significantly higher activity in the opposite contrast (Supplementary Fig. 4, Supplementary Table 1). This finding is reasonable as the cognitive effort is higher during the Listen trials because they provide explicit feedback for updating the inferences. Moreover, the feedback itself, presented as a visual stimulus is also more complex, with the listening action more likely to be selected when the confidence level is low. The rlPFC is thought to be activated in contexts involving exploration based on uncertain or probabilistic information^35,36 and updating the possibilities of multiple candidate grids simultaneously^37,38,39, from which the most likely option is selected^40,41. Updating grid location inference requires spatial mental simulation within the memorised maze, a process likely to involve PHG activity, known for processing spatial location information^42,43,44.

The computational analysis using the model parameters allowed for a more refined assessment of the possible roles of different brain regions. Consequently, the imaging analysis employing the hierarchical model can identify brain areas related to specific inference processes during the action phase. In the trials where the hierarchical inference model estimated that the grid location needed re-estimation due to the feedback (that is, tiger roar position) differing from the prediction, activity in the dmPFC and bilateral anterior insula increased (Fig. 4b, Supplementary Table 2). This re-estimation process involves changing one’s beliefs about their position in the maze. Notably, activity in the dmPFC has been previously observed with the re-estimation of hidden states in a comparable partially observable maze navigation task¹⁹. This area is known to play a role in switching behaviour^45,46,47 due to conflict⁴⁸, error detection^49,50 and error prediction^51,52. Hence, we predicted this result a priori. We also note that the anterior insula is known to encode prediction errors in decision-making^53,54,55,56, although this was not one of our a priori predictions.

On the other hand, when the feedback was as predicted, the previous inference (that is, belief about the grid location) was judged to be correct and was updated. Imaging analysis revealed that this was associated with greater activity in the SMA in these trials. In these trials, the participants indeed tended to select the same options as their estimated tiger door position in previous trials (76.5 ± 6.9% after the updating mode; 31.0 ± 10.3% after the re-estimation mode; p = 8.9 × 10⁻⁵). This illustrates a clear functional distinction between dmPFC and SMA responses in the context of this task.

We also compared brain activity in relation to the uncertainty of inference in the prediction phase, both in terms of subjective evaluation (confidence) and entropy estimated by the hierarchical inference model (Fig. 5a, b). The dACC and precentral gyrus activity, whose activities are correlated with both uncertainty indices, have also been reported in previous metacognition studies to be brain regions that provide metacognitive control signals (i.e., confidence) in various types of decision-making^57,58,59 and learning¹¹. Subjective confidence levels were based on the participants’ introspective and metacognitive evaluation of the uncertainty of the hidden-state inference reported in a binary form (high or low confidence). In other words, the coarse information extracted from inferences may not be a sufficiently accurate measure for capturing the neural correlates of the inference itself, which can be expressed as a probability distribution. As previous studies suggested that the posterior probability distribution is represented as the multivoxel activity patterns in the localised brain regions^60,61, in these regions the inferences may be encoded across multiple neuronal populations. Our proposed hierarchical inference model can compute the posterior distribution of state inference from which subjective confidence is derived. Its entropy is a measure of uncertainty regarding inference, which is, therefore, negatively correlated with confidence. Consistent with this model, the brain regions identified by parametric modulation analysis using the two types of entropy (entropy pertaining to tiger door position and grid location; Fig. 5a, b, warm colour scale, Supplementary Table 4) fully encompassed the brain regions that were negatively correlated with confidence levels for both tiger door and grid location predictions (Fig. 5a, b, cool colour scale, Supplementary Table 3).

For both types of inference uncertainty, the subregions of the parietal region exhibited increased activity when confidence was low compared to when confidence was high; however, the active regions differed depending on the type of information being predicted. For tiger door prediction, the participants inferred the hidden state as being in one direction (left, front, or right) based on their position, and we found that the inferior parietal lobule activity displayed orientation sensitivity^62,63 when there were multiple directional possibilities (Fig. 5a). In contrast, the right superior parietal lobule, which is associated with spatial navigation^64,65,66 demonstrated higher activity when confidence in predicting the grid location in the maze was low (Fig. 5b). The parietal cortex is known to be involved in spatial navigation^67,68,69 and these data add to the growing understanding of how it supports computationally precise functions.

We found that the dmPFC and dACC were active when hidden-state inferences (beliefs) tended to be updated. This observation is consistent with previous studies on humans^70,71,72, non-human primates⁷³ and rodents⁷⁴, and supports the idea that the dACC is active in uncertain environments where the internal models were updated based on new observations^23,71. The hierarchical nature of the hidden states in this study suggests that different subregions of the dmPFC become active at different levels of the hierarchy, i.e., a higher level hidden state (grid location) is associated with a more anterior part of the dmPFC. Previous studies have indicated that control abstraction is hierarchically organised along the rostral axis of this area, with more caudal parts involved in lower-order behavioural control, and more rostral parts involved in higher-order strategic control^75,76,77,78. This finding is also supported by research using the neurocomputational theory, which involves planning control over multiple levels of abstraction^79,80,81.

Similarly, in the bilateral putamen, the anterior areas were active for the grid inference, which is a higher level hidden state in the hierarchy, and the posterior areas were active for tiger door inference. To infer the grid location, it is necessary to integrate both uncertain (probabilistic) information and mental simulations based on the maze structure in the memory (internal world model). In contrast, the tiger door position was inferred solely from externally given observations and did not require an internal model. In other words, these inference processes are based on model-based or model-free control, respectively, which is consistent with previous results that suggest the neural circuitry involved in the anterior and posterior putamen are functionally separable, with the anterior corresponding to associative, goal-directed, model-based control, and the posterior to sensorimotor, habitual, model-free control in human^82,83,84,85, non-human primates^86,87, and rodents^88,89.

One notable strength lies in the innovative use of computational modelling, which allowed us to reproduce human hierarchical inference processes as the spatial inference problem in the navigation task, and to extract the neural substrates involved in inference using both subjective (i.e., confidence) and objective measures (uncertainty). However, it is important to acknowledge that cognitive research using models has its limitations. In our case, the computational model was explicitly hierarchical since the task presented an explicit hierarchy of states, whereas in a real-world environment, it is natural to assume that humans’ hierarchies or subdivide problems themselves based on the structure of states; this may overlook complex and important mechanisms of cognitive phenomena. In addition, conclusive evidence concerning the existence of functional hierarchical information processing among these neural substrates has not been presented. It has also not been demonstrated how the model parameters estimated for each individual are related to their neural activity. In our model, the parameters were involved in modifying the mechanisms that integrate observation and the two types of inference and may control the functional connectivity between these neural substrates. In this study, however, these hypotheses were not assessed mainly due to the limitations of the temporal resolution of the fMRI signals and the event-related task design. An intriguing avenue for future research lies in exploring the hierarchical functional networks within the brain and their correspondence with the computational models, potentially by employing higher temporal resolution measurements and analytical methods, and by using intervention-based approaches to verify causality in the brain network.

In summary, in this study, we designed a Tiger maze navigation task that combines two typical POMDP tasks: the Tiger problem and the partially observable maze navigation. We proposed a computational model of behaviour that frames behaviour as a spatial inference problem characterised by hierarchical hidden states. Brain imaging analysis has suggested that inference based on information under conditions of multi-layered uncertainty engages different areas of the medial prefrontal cortex and the basal ganglia, depending on the inference’s levels in the hierarchy and the nature of inference (model-based or model-free). Additionally, our imaging results demonstrated that the metacognitive evaluations of uncertainty, or confidence, in the context of hierarchical inference are represented in a cortical frontoparietal network. This enriches our understanding of the neural architecture of complex decision-making processes.

Methods

Subjects

In this study, twenty healthy participants (6 females; aged 20–29 years) were recruited for the experiment, and written informed consent was obtained. This study was approved by the ethics committees of the Advanced Telecommunications Research Institute International, Japan and the Graduate School of Informatics, Kyoto University, Japan. All ethical regulations relevant to human research participants were followed. Prior to undergoing functional magnetic resonance imaging (fMRI) scanning, all participants underwent a pre-experimental training task and engaged in a behavioural experiment concerning the Tiger maze exploration task. The minimum number of participants was defined as 18, based on a power analysis (α = 0.05, 1–β = 0.8), with the effect size calculated from a previous study on hierarchical decision-making¹⁸ using G*Power (http://www.gpower.hhu.de/).

Tiger maze navigation task

Participants explored a virtual 4 × 4 grid maze, where each grid had doors on all four sides, one of which had a tiger behind it (tiger door) that should not be opened (Fig. 1). The maze was designed with topological connections, resembling a torus, connecting the upper (north) and lower (south) boundaries of the maze as well as the left (west) and right (east) boundaries. We used a single common maze for all participants (Fig. 1b) to avoid individual variability in task difficulty. Since both tiger doors and normal doors have an identical appearance, the three-dimensional (3D) scene that participants can observe in the maze (three identical doors on the front, left, and right walls) is the same in all grids. In other words, the maze is partially observable, and the current observation alone does not determine the current grid location.

Each game starts from an unknown initial state, with each trial in the game divided into two phases: an action phase for exploring the maze and a prediction phase for reporting predictions (Fig. 1a). In the action phase, a 3D scene of the current location and body orientation was first displayed, and participants chose to either move (one of three moves: left, forward, or right) or listen (Fig. 1a). If they chose to move through a normal door, a short animation of moving to the selected adjacent grid was displayed. However, if they chose to move to the tiger door, the game ended. When participants chose to listen, a tiger roar was heard from one of the three doors, which turned red and displayed a letter indicating direction (L, F, or R). The accuracy of the tiger roar observation was probabilistic, with an 85% chance of a roar being observed from the tiger door and 7.5% from the normal door, with participants being informed of these probabilities. If no action was selected within 4 s, a fixation point was presented for 1.5 s, and the state remained the same until the next trial (0.35 ± 1.1 trials).

In the prediction phase, participants were first asked to predict the position of the tiger door in the next trial and rate their confidence in the prediction. The tiger door prediction was chosen from the three letters indicating the door position (L, F, and R), and the confidence level was chosen from two options: high (H) and low (L), with the positions of the options set randomly for each trial. Participants were then asked to predict the position of the grid in the maze in the next trial and rate their confidence level, as in the case of the tiger door. The prediction of the grid location was reported by selecting the corresponding row and column coordinates on the maze picture in order: the row coordinate (1: top, to 4: bottom), and then the column coordinate (1: left, to 4: right). All choices were required to be made within 2 s, and no choice trials were excluded from the analysis as missed trials (6.4 ± 5.4%). Each prediction and confidence report were preceded by a delay (4 s), with a fixation cross to allow participants to prepare their choices mentally. During the delay before grid prediction, a 4 × 4 grid was overlaid to encourage mental imagery of the grid location. For both predictions and confidence choices, feedback was provided as a green frame around the chosen option for 1.5 s. However, this feedback did not indicate whether the prediction was correct or not.

The game termination conditions were determined based on participants’ performance. These conditions were defined as having visited more than eight grids or having completed a specified number of trials (10–14 trials). If participants met these conditions, a yellow star was displayed on one of the doors, signifying the achievement of their goal, and the game score was displayed. If participants chose a tiger door, the game concluded with a score as ‘0’. Each game comprised 2–14 trials, with an average of 10.9 ± 2.4 trials per game. Participants were instructed that the more grids they explored and the more correct their predictions were, the more points they would receive (see “Game score” section). Each participant completed a total of 24 games (237.2 ± 22.0 trials) divided into eight sessions: four sessions (12 games) in the behavioural experiment outside the scanner and another four sessions (12 games) in the fMRI scanning experiment, with both experiments performed on the same day.

Game score

The game score was defined by participant’s performance in both the action and prediction phases as follows:

$${score}=\left\{{N}_{\exp }\times 5-{N}_{{lis}}+{\sum }_{t=1}^{T-1}\left({{rw}}_{{TD},t}+{{rw}}_{{GR},t}\right)\right\}$$

(1)

For the action phase, rewards were added according to the number of grids visited, ${N}_{\exp }$, with a small cost added according to the number of times Listen was chosen, ${N}_{{lis}}$, during the game. The score for the prediction trials, both for the tiger door prediction ${{rw}}_{{TD},t}$, and the grid prediction ${{rw}}_{{GR},t}$ was set to be higher when the correct prediction was made with high confidence (see Supplementary Table 5), and was summed over all prediction trials T in the game. The average scores for the 12 games were 37.8 ± 9.5 and 37.9 ± 7.6 in the behavioural and scanning experiments, respectively. The participants were paid a base monetary reward and an incentive based on the number of points they had scored (4700 ± 1100yen).

Training task

Approximately one week before the main experiment, participants performed a training task outside the scanner, in which they learned the task procedure and the structure of the maze. First, participants performed the Tiger maze navigation task, as in the main experiment, for 1 h and 45 min, but with reference to a printed two-dimensional maze map. Subsequently, they performed the same task without a printed map. Participants were compensated with a base payment and no performance-based rewards in the training task. Note that we did not model the participants’ decision-making behaviours, but used their actions as an observed variable for modelling the hidden-state inference.

Behavioural model

We proposed a computational model of participant behaviour using Bayesian methods, namely, a hierarchical inference model. In this model, the order in which the two hidden states are inferred varies depending on the chosen actions (Listen or Move); that is, inferences are made through a bidirectional flow of information from the higher or lower levels of the hierarchy. This model simulates the generative process of the probabilities associated with each option for predicting tiger door position ${s}_{{TD}}\,$($|{s}_{{TD}}|=3$) and grid location ${s}_{{GR}}\,$($|{s}_{{GR}}|=16$) based on the sequence of observable variables: participants’ actions ($a$) and the position of the roaring door ($v$). Throughout the model, variables marked with an asterisk (*) indicate true values, while variables marked with a hat (^) are subjective and inferred by the model. Here, the sequences of the participants’ actions were regarded as the observation variables for this model.

If the Listen action was selected in trial t, the position of the tiger door in trial t + 1 was first determined as follows (step 3a in Supplementary Fig. 2a):

$${P}_{t+1}\left({s}_{{TD},t+1}\right)\propto {P}_{t}\left({s}_{{TD},t}\right)L{\left({s}_{{TD},t}|{v}_{t}\right)}^{\delta }\,\begin{array}{cc}{{{{{\rm{where}}}}}} & L\left({s}_{{TD},t}{{{{{\rm{|}}}}}}{v}_{t}\right)=\left\{\begin{array}{cc}\alpha & {{{{{\rm{if}}}}}} \, {s}_{{TD},t}={v}_{t}\\ \frac{1-\alpha }{2} & {{{{{\rm{otherwise}}}}}}\end{array}\right.\end{array}$$

(2)

where $\alpha$ is the probability of roar observation from a tiger door (0.85) and $\delta$ is the sensitivity parameter that exponentially increases the influence of the information from the new observation. When the observed position of the roaring door (${v}_{t}$) agreed with the tiger door prediction (${\hat{s}}_{{TD}}$), participants credited the history of the observations in the current grid and updated the grid location probabilities. If participants perfectly memorised the maze structure, the grid location probabilities would be updated in the form of Bayesian filtering as follows:

$$\begin{array}{c}{P}_{t+1}\left({s}_{{GR},t+1}{{{{{\rm{|}}}}}}{v}_{{t}^{{\prime} }:t},{\hat{s}}_{{TD},t+1}\right)={P}_{{UD},t+1}\left({s}_{{GR},t+1}{{{{{\rm{|}}}}}}{v}_{{t}^{{\prime} }:t},{\hat{s}}_{{TD},t+1}\right)\\ \begin{array}{c}\propto \left\{\begin{array}{cc}\beta {P}_{t}\left({s}_{{GR},t}{{{{{\rm{|}}}}}}{v}_{{t}^{{\prime} }:t}\right) & {{{{{\rm{if}}}}}} \, {s}_{{TD}}^{* }\left({s}_{{GR},t},{d}_{t}\right)={v}_{t}\\ (1-\beta ){P}_{t}\left({s}_{{GR},t}{{{{{\rm{|}}}}}}{v}_{{t}^{{\prime} }:t}\right) & {{{{{\rm{if}}}}}} \, {s}_{{TD}}^{* }\left({s}_{{GR},t},{d}_{t}\right) \, \ne \, {v}_{t}\\ 0 & {{{{{\rm{if}}}}}}{s}_{{TD}}^{* }\left({s}_{{GR},t},{d}_{t}\right)={{{{{\rm{back}}}}}}\end{array}\right.\\ \begin{array}{cc}{{{{{\rm{where}}}}}} & {\hat{s}}_{{TD},t+1}\end{array}={{{{{{\rm{argmax}}}}}}}_{{s}_{{TD}}}\,{P}_{t+1}\left({s}_{{TD},t+1}\right)\end{array}\end{array}$$

(3)

${s}_{{TD}}^{* }$ is the true tiger door position in the grid ${s}_{{GR}}$ with an observable body orientation ${d}_{t}$. $\beta$ is the degree of dependence on the tiger door prediction; if $\beta =1$, participants extract grids for which ${s}_{{TD}}^{* }\left({s}_{{GR}},{d}_{t}\right)$ is consistent with ${s}_{{TD}}$ as the candidates and consider all others as unlikely. t’ denotes the number of the first trials after transfer to the current grid ${s}_{{GR},t}^{* }$. Because the door behind participants was always passable, grids where ${s}_{{TD}}^{* }\left({s}_{{GR}},{d}_{t}\right)$ was the backside should be excluded from the candidates of the grid location (the third case of Eq. (3)). Here, if the participants’ memory incomplete, i.e., they erred in recalling the maze structure with the probability $\gamma$, they mistakenly update the grid location probability according to the second case in Eq. (3), even when the true tiger door position matched the roaring door position (corresponding to the first case), with the error probability $\gamma$: similarly when the true tiger door position was inconsistent to the roaring door position (corresponding to the second case), the update in the first case in Eq. (3) could occur with $\gamma$. Participants also could update the grid location where ${s}_{{TD}}^{* }\left({s}_{{GR}},{d}_{t}\right)$ was backside if they made a mistake about the maze structure (corresponding to the third case). In summary, the dynamics of the grid inference are defined as follows (update mode, step 4a+ in Supplementary Fig. 2a):

$$\begin{array}{c}{P}_{t+1}\left({s}_{{GR},t+1}{{{{{\rm{|}}}}}}{v}_{{t}^{{\prime} }:t},{\hat{s}}_{{TD},t+1}\right)={P}_{{UD},t+1}\left({s}_{{GR},t+1}{{{{{\rm{|}}}}}}{v}_{{t}^{{\prime} }:t},{\hat{s}}_{{TD},t+1}\right)\\ \propto \left\{\begin{array}{cc}\left[\left(1-\gamma \right)\beta +\gamma (1-\beta )\right]{P}_{t}\left({s}_{{GR},t}{{{{{\rm{|}}}}}}{v}_{{t}^{{\prime} }:t}\right) & {{{{{\rm{if}}}}}}{s}_{{TD}}^{* }\left({s}_{{GR},t},{d}_{t}\right)={v}_{t}\\ \left[\gamma \beta +(1-\gamma )(1-\beta )\right]{P}_{t}\left({s}_{{GR},t}{{{{{\rm{|}}}}}}{v}_{{t}^{{\prime} }:t}\right) & {{{{{\rm{if}}}}}}{s}_{{TD}}^{* }\left({s}_{{GR},t},{d}_{t}\right) \, \ne \, {v}_{t}\\ \gamma {P}_{t}\left({s}_{{GR},t}{{{{{\rm{|}}}}}}{v}_{{t}^{{\prime} }:t}\right) & {{{{{\rm{if}}}}}}{s}_{{TD}}^{* }\left({s}_{{GR},t},{d}_{t}\right)={{{{{\rm{back}}}}}}\end{array}\right.\\ \begin{array}{cc}{{{{{\rm{where}}}}}} & {\hat{s}}_{{TD},t+1}\end{array}={{{{{{\rm{argmax}}}}}}}_{{s}_{{TD}}}\,{P}_{t+1}\left({s}_{{TD},t+1}\right)\end{array}$$

(4)

In contrast, if ${v}_{t}$ was inconsistent with ${\hat{s}}_{{TD},t}$, participants rejected the inference ${P}_{t}\left({s}_{{GR},t}\right)$ based on the previous observations ${v}_{{t}^{{\prime} }:t}$, and re-estimated under the current observation ${v}_{t}$ and the probabilities at trial t’ as the prior distribution (re-estimate mode, step 4a in Supplementary Fig. 2a). Here, it is assumed that even in the re-estimation, participants may (with a certain probability $\varepsilon$) continue to update their inferences based on the previous (unreliable) observations. In summary, the dynamics of the grid inference when ${\hat{s}}_{{TD},t} \, \ne \, {v}_{t}$ are defined as follows:

$$ {P}_{{RE},t+1}\left({s}_{{GR},t+1}{{{{{\rm{|}}}}}}{v}_{{t}^{{\prime} }:t},{\hat{s}}_{{TD},t}\right) \\ =\left\{\begin{array}{c}\begin{array}{cc}\left[\left(1-\gamma \right)\beta +\gamma \left(1-\beta \right)\right]{P}_{{t}^{{\prime} }}\left({s}_{{GR},{t}^{{\prime} }}\right) & {{{{{\rm{if}}}}}}{s}_{{TD}}^{* }\left({s}_{{GR},t},{d}_{t}\right)={\hat{s}}_{{TD},t+1}\\ \left[\gamma \beta +\left(1-\gamma \right)\left(1-\beta \right)\right]{P}_{{t}^{{\prime} }}\left({s}_{{GR},{t}^{{\prime} }}\right) & {{{{{\rm{if}}}}}}{s}_{{TD}}^{* }\left({s}_{{GR},t},{d}_{t}\right) \, \ne \, {\hat{s}}_{{TD},t+1}\\ \gamma {P}_{{t}^{{\prime} }}\left({s}_{{GR},{t}^{{\prime} }}\right) & {{{{{\rm{if}}}}}}{s}_{{TD}}^{* }\left({s}_{{GR},t},{d}_{t}\right)={{{{{\rm{back}}}}}}\end{array}\end{array}\right.$$

(5)

$$ {P}_{t+1}\left({s}_{{GR},t+1}{{{{{\rm{|}}}}}}{v}_{{t}^{{\prime} }:t},{\hat{s}}_{{TD},t+1}\right)\\ =\left(1-\varepsilon \right){P}_{{RE},t+1}\left({s}_{{GR},t+1}{{{{{\rm{|}}}}}}{v}_{{t}^{{\prime} }:t},{\hat{s}}_{{TD},t+1}\right)+\varepsilon {P}_{{UD},t+1}\left({s}_{{GR},t+1}{{{{{\rm{|}}}}}}{v}_{{t}^{{\prime} }:t},{\hat{s}}_{{TD},t+1}\right)$$

(6)

Note that, Eq. (5) derived from Eq. (4), in which the prior probability term ${P}_{t}\left({s}_{{GR},t}|{v}_{{t}^{{\prime} }:t}\right)$ was replaced with ${P}_{{t}^{{\prime} }}\left({s}_{{GR},{t}^{{\prime} }}\right)$. The grid location was inferred using a method that depended on whether the direction of the observed tiger roar matched their prediction or not. Here, the MAP estimate (a single most likely direction) was used for the tiger door position, rather than the fully probabilistic estimate. This is because the TD observation information (the direction of the door in which the tiger roars) is inherently probabilistic, and there is only one true tiger door state.

If the participants chose to move to an adjacent grid, the inferences were made in the opposite direction, i.e., the probabilities of the grid location were updated based on the memory of the maze structure, and then the tiger door position was predicted based on that inference (steps 3b and 4b in Supplementary Fig. 2a):

$$\begin{array}{cc}{P}_{t+1}\left({s}_{{GR},t+1}\right)={P}_{t}\left({s}_{{GR},t}\right) & \,\begin{array}{cc}{{{{{\rm{where}}}}}} & {s}_{{GR},t+1}={\rm T}\left({s}_{{GR},t},{d}_{t},{a}_{t}\right)\end{array}\end{array}$$

(7)

$$\begin{array}{c}{P}_{t+1}\left({s}_{{TD},t+1}{{{{{\rm{|}}}}}}{d}_{t+1}\right)=\mathop{\sum }_{{s}_{{GR},t+1}}P({s}_{{TD}}{{{{{\rm{|}}}}}}{s}_{{GR},t+1},{d}_{t+1}){P}_{t+1}\left({s}_{{GR},t+1}\right)\\ \begin{array}{cc}{{{{{\rm{where}}}}}} & P\left({s}_{{TD}}|{s}_{{GR},t+1},{d}_{t+1}\right)=\left\{\begin{array}{cc}1-\gamma & {{{{{\rm{if}}}}}}{s}_{{TD}}={s}_{{TD}}^{* }\left({s}_{{GR},t+1},{d}_{t+1}\right)\\ \frac{\gamma }{3} & {{{{{\rm{otherwise}}}}}}\end{array}\right.\end{array}\end{array}$$

(8)

Τ is a fixed transition function, where ${s}^{{\prime} }=T\left(s,d,a\right)$ indicates that if door a is selected in grid s, facing direction d, it will move to grid s’.

Alternative models

We examined two additional models that assume different inference processes. The first is a top-down inference model, which first infers the grid location on trial t + 1 and then predicts the tiger door position based on the inference of the grid location and the memory of the maze structure (Eq. (8)), regardless of the chosen action. In the Listen trials, the grid location was estimated from the observed tiger roar direction ${v}_{t}$ using incremental Bayesian filtering, as follows:

$${P}_{t+1}\left({s}_{{GR},t+1}{{{{{\rm{|}}}}}}{v}_{t}\right)\propto \left\{\begin{array}{cc}\left[\left(1-\gamma \right)\alpha +\gamma \left(1-\alpha \right)\right]{P}_{t}\left({s}_{{GR},t}\right) & {{{{{\rm{if}}}}}} \, {s}_{{TD}}^{* }\left({s}_{{GR},t},{d}_{t}\right)={v}_{t}\\ \left[\gamma \alpha +\left(1-\gamma \right)\left(1-\alpha \right)\right]{P}_{t}\left({s}_{{GR},t}\right) & {{{{{\rm{if}}}}}} \, {s}_{{TD}}^{* }\left({s}_{{GR},t},{d}_{t}\right) \, \ne \, {v}_{t}\\ \gamma {P}_{t}\left({s}_{{GR},t}\right) & {{{{{\rm{if}}}}}} \, {s}_{{TD}}^{* }\left({s}_{{GR},t},{d}_{t}\right)={{{{{\rm{back}}}}}}\end{array} \right.$$

(9)

In the Move trials, the grid and tiger door positions were inferred using the same method as in the hierarchical model (Eqs. (7) and (8), respectively).

The second is a parallel inference model which infers the grid location and the tiger door position independently. In this model, the grid location was inferred in the same manner as in the top-down inference model (Eq. (9) in the Listen trials and Eq. (7) in the Move trials). The tiger door position was inferred according to Eq. (2) in the Listen trials. In the Move trials, it is estimated independently of the grid location probability distribution, i.e., based on the probability distribution of the tiger door and the direction chosen in trial t and the maze structure information, as follows:

$$\begin{array}{c}{P}_{T+1}\left({s}_{{TD},t+1}{{{{{\rm{|}}}}}}{d}_{t+1},{a}_{t}\right)\propto {\sum }_{{s}_{{TD},t=\left\{L,F,R\right\}}}P^{\prime} \left({s}_{{TD},t}\right)P\left({s}_{{TD},t+1}{{{{{\rm{|}}}}}}{s}_{{TD},t}\right)\\ ={\sum }_{{s}_{{TD},t=\left\{L,F,R\right\}}}P^{\prime} \left({s}_{{TD},t}\right)\frac{N\left({s}_{{TD},t}\to {s}_{{TD},t+1}\right)}{{\sum }_{{s}_{{TD},t+1=\left\{L,F,R\right\}}}N\left({s}_{{TD},t}\to {s}_{{TD},t+1}\right)}\\ \begin{array}{cc}{{{{{\rm{where}}}}}} & {P}^{{\prime} }\left({s}_{{TD},t}\right)=\left\{\begin{array}{cc}{P}_{t}\left({s}_{{TD},t}\right) & {{{{{\rm{if}}}}}} \, {s}_{{TD},t} \, \ne \, {a}_{t}\\ 0 & {{{{{\rm{otherwise}}}}}}\end{array}\right.\end{array}\end{array}$$

(10)

$$ N\left({s}_{{TD},t}\to {s}_{{TD},t+1}\right)=n\left(S\right)\,\\ \begin{array}{cc}{{{{{\rm{where}}}}}} & S=\left\{{s}_{{GR}}{{{{{\rm{|}}}}}}{s}_{{TD}}^{* }\left({s}_{{GR}},{d}_{t}\right)={s}_{{TD},t}\,\cap \,{s}_{{TD}}^{* }\left({s}_{{GR}}^{{\prime} },{d}_{t},{a}_{t}\right)={s}_{{TD},t+1}\right\}\end{array}$$

(11)

Parameter estimation and model validation

Our proposed hierarchical inference model has four parameters: the sensitivity to the new evidence in the tiger door position inference ($\delta$), the tiger door prediction dependency of the grid inference ($\beta$), the updating probability in the re-estimation mode ($\varepsilon$), and the imperfectness of subjects’ memory of the maze structure ($\gamma$). The parameter ranges were predetermined as $\delta =\left[{{{{\mathrm{1,3}}}}}\right]$, $\beta =[{{{{\mathrm{0.5,0.999}}}}}]$, $\varepsilon =\left[{{{{\mathrm{0,1}}}}}\right]$, and $\gamma =[{{{{\mathrm{0,0.3}}}}}]$. The model parameters were estimated individually by minimising the negative log evidence (Eqs. (12)–(14)):

$${{NLE}}_{{TD}}=-\log {\prod }_{g=1}^{G}P\left({\hat{{{{{{\boldsymbol{S}}}}}}}}_{{TD},g}^{* }{{{{{\rm{|}}}}}}{{{{{\boldsymbol{\theta }}}}}}\right)=-\log {\prod }_{g=1}^{G}{\prod }_{t=1}^{T\left(g\right)}P\left({\hat{s}}_{{TD},t}^{* }{{{{{\rm{|}}}}}}{{{{{\boldsymbol{\theta }}}}}}\right)$$

(12)

$${{NLE}}_{{GR}}=-\log {\prod }_{g=1}^{G}P\left({\hat{{{{{{\boldsymbol{S}}}}}}}}_{{GR},g}^{* }{{{{{\rm{|}}}}}}{{{{{\boldsymbol{\theta }}}}}}\right)=-\log {\prod }_{g=1}^{G}{\prod }_{t=1}^{T\left(g\right)}P\left({\hat{s}}_{{GR},t}^{* }{{{{{\rm{|}}}}}}{{{{{\boldsymbol{\theta }}}}}}\right)$$

(13)

$$\begin{array}{cc}{{NLE}}_{{tot}}={\eta }_{{TD}}{{NLE}}_{{TD}}+{\eta }_{{GR}}{{NLE}}_{{GR}} & {{{{{\rm{where}}}}}} \, {\eta }_{{TD}}=\frac{1}{\log \left(\left|{s}_{{TD}}\right|\right)},\,{\eta }_{{GR}}=\frac{1}{\log \left(\left|{s}_{{GR}}\right|\right)}\,\end{array}$$

(14)

Here, G is the number of games, and ${\hat{{{{{{\boldsymbol{S}}}}}}}}_{{TD},g}^{* }$ and ${\hat{{{{{{\boldsymbol{S}}}}}}}}_{{GR},g}^{* }$ are the sequences of the tiger door and grid prediction reported by the participants, respectively, where $T(g)$ is the number of trials in the game g. The set of the model parameters is denoted as ${{{{{\boldsymbol{\theta }}}}}}$. ${\eta }_{{TD}}$ and ${\eta }_{{GR}}$ are scaling parameters that compensate for differences in the size of the state space between ${s}_{{TD}}$ and ${s}_{{GR}}$; $|{s}_{{TD}}|$ (=3) and $|{s}_{{GR}}|$ (=16) are the number of possible options for the tiger door and the grid, respectively. BIC was used for Bayesian model selection (Table 1). To avoid data circularity, we used data from 12 games in the behavioural experiment for parameter selection and data from the scanning experiment for model validation. When comparing the proposed hierarchical inference model with the alternative models (the top-down inference model and the parallel inference model), we used the parameter values estimated for each model using Eqs. (12)–(14) (${{{{{\boldsymbol{\theta }}}}}}=\{\gamma \}$ for the top-down model; ${{{{{\boldsymbol{\theta }}}}}}=\{\delta ,\gamma \}$ for the parallel model).

In validating our proposed hierarchical model, we also used the agreement between the model’s prediction of the states (tiger door,${\widehat{{TD}}}_{t}$; grid, ${\widehat{{GR}}}_{t}$) and the actual states reported by the participants (${{TD}}_{t}^{* }$ and ${{GR}}_{t}^{* }$, respectively) as their predictions. The two types of states were predicted by

$${\widehat{{TD}}}_{t}={{{{{{\rm{argmax}}}}}}}_{{s}_{{TD}}}\,{P}_{t+1}\left({s}_{{TD},t+1}{{{{{\rm{|}}}}}}{d}_{t+1}\right)$$

(15)

$${\widehat{{GR}}}_{t}={{{{{{\rm{argmax}}}}}}}_{{s}_{{GR}}}\,{P}_{t+1}\left({s}_{{GR},t+1}\right)$$

(16)

When Eqs. (15) and (16) yielded multiple equally probable states (i.e., maximum a posteriori [MAP]), we assumed that the model randomly extracted one of the MAP states as its prediction.

Image acquisition and analysis

A 3.0-Tesla Siemens MAGNETOM Prisma fit scanner (Siemens Healthineers, Erlangen, Germany) with a standard 64-channel phased-array head coil was used for image acquisition. We acquired interleaved T2*-weighted echo-planar images (TR, 1000 ms; TE, 30 ms; flip angle, 50°; matrix size, 100 × 100; field of view, 200 × 200; voxel size, 2 × 2 × 2.5 mm; and number of slices, 66). Volume acquisition was synchronised with the onset of each delay period. We also acquired whole-brain high-resolution T1-weighted structural images using a standard MPRAGE sequence (TR, 2250 ms; TE, 3.06 ms; flip angle, 9°; field of view, 256 × 256; voxel size, 1 × 1 ×1 mm).

Imaging data were analysed using SPM12 (Wellcome Department of Cognitive Neurology, London, UK). For each participant, all functional images were preprocessed, including slice-timing correction, spatial realignment, co-registration to the individual high-resolution anatomical image, normalisation to an MNI template, and smoothing with a Gaussian kernel filter (FWHM, 8 mm). In addition, high-pass filtering with a cut-off of 128 s was applied to remove low-frequency drifts from the signal.

Our fMRI analyses were conducted using standard GLMs, which employ event-related regressors convolved with the canonical hemodynamic response function. The basic first-level design matrix of the GLMs included a constant term, 6 motion parameters as nuisance regressors, and 12 event-related regressors in each trial (events indicated by numbers in Fig. 1a) per session. The four regressors for the delay periods before the prediction and confidence report (corresponding to steps 4, 6, 8, and 11 in Fig. 1a) were modelled as boxcar functions with a duration of 4 s, whereas the other regressors were modelled as delta functions. The regression analyses were conducted for the following four GLMs, each of which had additional specific regressors of interest in the model (see below).

GLM1: The action feedback period (Step 3 in Fig. 1a) was modelled as two independent events according to the action selected by the participant (Listen and Move) to identify the brain regions involved in feedback-based processing based on the types of actions (Fig. 4a, Supplementary Fig. 3).

GLM2: The action feedback period in the Listen trial (step 3a in Fig. 1a) was modelled as two independent modes of information processing based on our hierarchical inference model (re-estimation and updating modes) to identify the brain regions in the different inference procedures after the same action (Fig. 4b, c).

GLM3: For the tiger door position and grid location prediction events (steps 4 and 8 in Fig. 1a), the participant’s subjective confidence level in each prediction (0.5 for high confidence and −0.5 for low confidence) was introduced as a parametric modulator to examine confidence-related brain activity (Fig. 5, cool colour scale). Each prediction onset was modulated only by the corresponding confidence level.

GLM4: For the tiger door position and grid location predictions (steps 4 and 8 in Fig. 1a), the entropy of each inference, calculated based on the computational model (hierarchical inference model, Eqs. (17) and (18)), was introduced as a parametric modulator (Fig. 5, warm colour scale). The variables were scaled to the range [0,1] and mean-centred (that is, orthogonalized for the constant term).

$${U}_{{TD},t}=-{\sum }_{{s}_{{TD}}}{P}_{t}\left({s}_{{TD}}\right)\log {P}_{t}\left({s}_{{TD}}\right)$$

(17)

$${U}_{{GR},t}=-{\sum }_{{s}_{{GR}}}{P}_{t}\left({s}_{{GR}}\right)\log {P}_{t}\left({s}_{{GR}}\right)$$

(18)

A random-effects analysis was performed at the group level using an anatomically localised cerebral cortex. Statistical thresholds were set at the voxel level of p < 0.001 (uncorrected) and the cluster level of p < 0.05 (FWE-corrected).

Statistics and reproducibility

We analysed data from N = 20 healthy human participants. The task was implemented using Psychopy3⁹⁰. The behavioural analyses were performed using MATLAB R2023a (Mathworks, Natick, Massachusetts, US) and R Statistical Software version 4.2.3⁹¹. The imaging analyses were performed using SPM12.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Anonymized behavioural data are available on GitHub (https://github.com/RisaKatayama/article-belief-hierarchical-spatial.git). Unthresholded group-level statistical maps are available on NeuroVault (https://neurovault.org/collections/NZJMDMFQ/).

Code availability

All codes supporting our main results presented in this study are available on GitHub (https://github.com/RisaKatayama/article-belief-hierarchical-spatial.git).

References

Zhao, W. & Chen, W. Hierarchical POMDP planning for object manipulation in clutter. Robot. Auton. Syst. 139, 103736 (2021).
Serrano, S. A., Santiago, E., Martinez-Carranza, J., Morales, E. F. & Sucar, L. E. Knowledge-based hierarchical POMDPs for task planning. J. Intell. Robot. Syst. Theory Appl. 101, 1–23 (2021).
Google Scholar
Theocharous, G., Rohanimanesh, K. & Mahadevan, S. Learning hierarchical partially observable Markov decision process models for robot navigation. In Proc. IEEE International Conference on Robotics and Automation Vol. 1 511–516 (IEEE, 2001).
Theocharous, G. & Mahadevan, S. Approximate planning with hierarchical partially observable Markov decision process models for robot navigation. In Proc. IEEE International Conference on Robotics and Automation Vol. 2 1347–1352 (IEEE, 2002).
Qian, K., Ma, X., Dai, X., Fang, F. & Zhou, B. Decision-theoretical navigation of service robots using POMDPs with human-robot co-occurrence prediction. Int. J. Adv. Robot. Syst. 10, 143 (2013).
Gebauer, C., Dengler, N. & Bennewitz, M. Sensor-based navigation using hierarchical reinforcement learning. Lecture Notes Netw. Syst. 577, 546–560 (2023).
Article Google Scholar
Riesenhuber, M. & Poggio, T. Hierarchical models of object recognition in cortex. Nat. Neurosci. 2, 1019–1025 (1999).
Article CAS PubMed Google Scholar
Cichy, R. M., Khosla, A., Pantazis, D., Torralba, A. & Oliva, A. Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Sci. Rep. 6, 1–13 (2016).
Article Google Scholar
Yamins, D. L. K. et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl Acad. Sci. USA 111, 8619–8624 (2014).
Article CAS PubMed PubMed Central Google Scholar
Sarafyazd, M. & Jazayeri, M. Hierarchical reasoning by neural circuits in the frontal cortex. Science 364, eaav8911 (2019).
Article CAS PubMed Google Scholar
Meyniel, F. & Dehaene, S. Brain networks for confidence weighting and hierarchical inference during probabilistic learning. Proc. Natl Acad. Sci. USA 114, E3859–E3868 (2017).
Article CAS PubMed PubMed Central Google Scholar
Weilnhammer, V. A., Stuke, H., Sterzer, P. & Schmack, K. The neural correlates of hierarchical predictions for perceptual decisions. J. Neurosci. 38, 5008–5021 (2018).
Article CAS PubMed PubMed Central Google Scholar
Kawato, M., Furukawa, K. & Suzuki, R. A hierarchical neural-network model for control and learning of voluntary movement. Biol. Cybern. 57, 169–185 (1987).
Article CAS PubMed Google Scholar
Ikegami, T. et al. Hierarchical motor adaptations negotiate failures during force field learning. PLoS Comput. Biol. 17, 1–28 (2021).
Article Google Scholar
Stringer, S. M., Rolls, E. T. & Taylor, P. Learning movement sequences with a delayed reward signal in a hierarchical model of motor function. Neural Netw. 20, 172–181 (2007).
Article CAS PubMed Google Scholar
Ong, W. S., Madlon-Kay, S. & Platt, M. L. Neuronal correlates of strategic cooperation in monkeys. Nat. Neurosci. 24, 116–128 (2021).
Article CAS PubMed Google Scholar
Yoshida, W., Seymour, B., Friston, K. J. & Dolan, R. J. Neural mechanisms of belief inference during cooperative games. J. Neurosci. 30, 10744–10751 (2010).
Article CAS PubMed PubMed Central Google Scholar
Diuk, C., Tsai, K., Wallis, J., Botvinick, M. & Niv, Y. Hierarchical learning induces two simultaneous, but separable, prediction errors in human basal ganglia. J. Neurosci. 33, 5797–5805 (2013).
Article CAS PubMed PubMed Central Google Scholar
Yoshida, W. & Ishii, S. Resolution of uncertainty in prefrontal cortex. Neuron 50, 781–789 (2006).
Article CAS PubMed Google Scholar
Katayama, R., Yoshida, W. & Ishii, S. Confidence modulates the decodability of scene prediction during partially-observable maze exploration in humans. Commun. Biol. 5, 1–14 (2022).
Article Google Scholar
Shikauchi, Y. & Ishii, S. Decoding the view expectation during learned maze navigation from human fronto-parietal network. Sci. Rep. 5, 1–13 (2015).
Article Google Scholar
Monosov, I. E. Anterior cingulate is a source of valence-specific information about value and uncertainty. Nat. Commun. 8, 1–11 (2017).
Article CAS Google Scholar
Behrens, T. E. J., Woolrich, M. W., Walton, M. E. & Rushworth, M. F. S. Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–1221 (2007).
Article CAS PubMed Google Scholar
Ting, C. C. et al. Neural mechanisms for integrating prior knowledge and likelihood in value-based probabilistic inference. J. Neurosci. 35, 1792–1805 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kolling, N., Behrens, T. E. J., Wittmann, M. K. & Rushworth, M. F. S. Multiple signals in anterior cingulate cortex. Curr. Opin. Neurobiol. 37, 36–43 (2016).
Article CAS PubMed PubMed Central Google Scholar
Cassandra, A. R., Kaelbling, L. P. & Littman, M. L. Acting optimally in partially observable stochastic domains. Proc. Twelfth Natl Conf. Artif. Intell. 132, 1023–1028 (1995).
Google Scholar
Simon, D. A. & Daw, N. D. Neural correlates of forward planning in a spatial decision task in humans. J. Neurosci. 31, 5526–5539 (2011).
Article CAS PubMed PubMed Central Google Scholar
de Cothi, W. et al. Predictive maps in rats and humans for spatial navigation. Curr. Biol. 32, 3676–3689.e5 (2022).
Article PubMed PubMed Central Google Scholar
Anggraini, D., Glasauer, S. & Wunderlich, K. Neural signatures of reinforcement learning correlate with strategy adoption during spatial navigation. Sci. Rep. 8, 1–14 (2018).
Article CAS Google Scholar
Zhu, S., Lakshminarasimhan, K. J., Arfaei, N. & Angelaki, D. E. Eye movements reveal spatiotemporal dynamics of visually-informed planning in navigation. eLife 11, 1–34 (2022).
Article Google Scholar
Epstein, R. & Kanwisher, N. A cortical representation the local visual environment. Nature 392, 598–601 (1998).
Article CAS PubMed Google Scholar
Yoshida, W., Funakoshi, H. & Ishii, S. Hierarchical rule switching in prefrontal cortex. Neuroimage 50, 314–322 (2009).
Article PubMed Google Scholar
Amemori, K. I. & Graybiel, A. M. Localized microstimulation of primate pregenual cingulate cortex induces negative decision-making. Nat. Neurosci. 15, 776–785 (2012).
Article CAS PubMed PubMed Central Google Scholar
Zorowitz, S. et al. The neural basis of approach-avoidance conflict: a model based analysis. eNeuro 6, 1–12 (2019).
Article Google Scholar
Badre, D., Doll, B. B., Long, N. M. & Frank, M. J. Rostrolateral prefrontal cortex and individual differences in uncertainty-driven exploration. Neuron 73, 595–607 (2012).
Article CAS PubMed PubMed Central Google Scholar
Tomov, M. S., Truong, V. Q., Hundia, R. A. & Gershman, S. J. Dissociable neural correlates of uncertainty underlie different exploration strategies. Nat. Commun. 11, 1–12 (2020).
Article Google Scholar
Badre, D. & D’Esposito, M. Functional magnetic resonance imaging evidence for a hierarchical organization of the prefrontal cortex. J. Cogn. Neurosci. 19, 2082–2099 (2007).
Article PubMed Google Scholar
Braver, T. S. & Bongiolatti, S. R. The role of frontopolar cortex in subgoal processing during working memory. Neuroimage 15, 523–536 (2002).
Article PubMed Google Scholar
Koechlin, E., Ody, C. & Kouneiher, F. The architecture of cognitive control in the human prefrontal cortex. Science 302, 1181–1185 (2003).
Article CAS PubMed Google Scholar
Badre, D. & Wagner, A. D. Selection, Integration, and conflict monitoring: assessing the nature and generality of prefrontal cognitive control mechanisms. Neuron 41, 473–487 (2004).
Article CAS PubMed Google Scholar
Wolfensteller, U. & von Cramon, D. Y. Strategy-effects in prefrontal cortex during learning of higher-order S-R rules. Neuroimage 57, 598–607 (2011).
Article PubMed Google Scholar
Aguirre, G. K., Detre, J. A., Alsop, D. C. & D’Esposito, M. The parahippocampus subserves topographical learning in man. Cereb. Cortex 6, 823–829 (1996).
Article CAS PubMed Google Scholar
Aguirre, G. K., Zarahn, E. & D’Esposito, M. Neural components of topographical representation. Proc. Natl Acad. Sci. USA 95, 839–846 (1998).
Article CAS PubMed PubMed Central Google Scholar
Owen, A. M., Milner, B., Petrides, M. & Evans, A. C. A specific role for the right parahippocampal gyrus in the retrieval of object-location: a positron emission tomography study. J. Cogn. Neurosci. 8, 588–602 (1996).
Article CAS PubMed Google Scholar
Fleming, S. M., Van Der Putten, E. J. & Daw, N. D. Neural mediators of changes of mind about perceptual decisions. Nat. Neurosci. 21, 617–624 (2018).
Article CAS PubMed PubMed Central Google Scholar
Fleck, M. S., Daselaar, S. M., Dobbins, I. G. & Cabeza, R. Role of prefrontal and anterior cingulate regions in decision-making processes shared by memory and nonmemory tasks. Cereb. Cortex 16, 1623–1630 (2006).
Article PubMed Google Scholar
Heereman, J., Walter, H. & Heekeren, H. R. A task-independent neural representation of subjective certainty in visual perception. Front. Hum. Neurosci. 9, 1–12 (2015).
Article Google Scholar
Botvinick, M. M., Carter, C. S., Braver, T. S., Barch, D. M. & Cohen, J. D. Conflict monitoring and cognitive control. Psychol. Rev. 108, 624–652 (2001).
Article CAS PubMed Google Scholar
Holroyd, C. B. & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychol. Rev. 109, 679–709 (2002).
Article PubMed Google Scholar
Boldt, A. & Yeung, N. Shared neural markers of decision confidence and error detection. J. Neurosci. 35, 3478–3484 (2015).
Article CAS PubMed PubMed Central Google Scholar
Brown, J. W. & Braver, T. S. Learned predictions of error likelihood in the anterior cingulate cortex. Science 307, 1118–1121 (2005).
Article CAS PubMed Google Scholar
Jessup, R. K., Busemeyer, J. R. & Brown, J. W. Error effects in anterior cingulate cortex reverse when error likelihood is high. J. Neurosci. 30, 3467–3472 (2010).
Article CAS PubMed PubMed Central Google Scholar
Preuschoff, K., Quartz, S. R. & Bossaerts, P. Human insula activation reflects risk prediction errors as well as risk. J. Neurosci. 28, 2745–2752 (2008).
Article CAS PubMed PubMed Central Google Scholar
Loued-Khenissi, L., Pfeuffer, A., Einhäuser, W. & Preuschoff, K. Anterior insula reflects surprise in value-based decision-making and perception. Neuroimage 210, 116549 (2020).
Article PubMed Google Scholar
Billeke, P. et al. Human anterior insula encodes performance feedback and relays prediction error to the medial prefrontal cortex. Cereb. Cortex 30, 4011–4025 (2020).
Article PubMed Google Scholar
Bastin, J. et al. Direct recordings from human anterior insula reveal its leading role within the error-monitoring network. Cereb. Cortex 27, 1545–1557 (2017).
PubMed Google Scholar
Su, J., Jia, W. & Wan, X. Task-specific neural representations of generalizable metacognitive control signals in the human dorsal anterior cingulate cortex. J. Neurosci. 42, 1275–1291 (2022).
Article CAS PubMed PubMed Central Google Scholar
Pereira, M. et al. Disentangling the origins of confidence in speeded perceptual judgments through multimodal imaging. Proc. Natl Acad. Sci. USA 117, 8382–8390 (2020).
Article CAS PubMed PubMed Central Google Scholar
Fleming, S. M., Huijgen, J. & Dolan, R. J. Prefrontal contributions to metacognition in perceptual decision making. J. Neurosci. 32, 6117–6125 (2012).
Article CAS PubMed PubMed Central Google Scholar
Glaser, J. I., Perich, M. G., Ramkumar, P., Miller, L. E. & Kording, K. P. Population coding of conditional probability distributions in dorsal premotor cortex. Nat. Commun. 9, 1788 (2018).
Article PubMed PubMed Central Google Scholar
Chan, S. C. Y., Niv, Y. & Norman, K. A. A probability distribution over latent causes, in the orbitofrontal cortex. J. Neurosci. 36, 7817–7828 (2016).
Article CAS PubMed PubMed Central Google Scholar
Vilares, I., Howard, J. D., Fernandes, H. L., Gottfried, J. A. & Kording, K. P. Differential representations of prior and likelihood uncertainty in the human brain. Curr. Biol. 22, 1641–1648 (2012).
Article CAS PubMed PubMed Central Google Scholar
Plaza, P., Cuevas, I., Grandin, C., De Volder, A. G. & Renier, L. Looking into task-specific activation using a prosthesis substituting vision with audition. ISRN Rehabil. 2012, 1–15 (2012).
Article Google Scholar
Chen, Y. et al. Allocentric versus egocentric representation of remembered reach targets in human cortex. J. Neurosci. 34, 12515–12526 (2014).
Article CAS PubMed PubMed Central Google Scholar
Lester, B. D. & Dassonville, P. The role of the right superior parietal lobule in processing visual context for the establishment of the egocentric reference frame. J. Cogn. Neurosci. 26, 2201–2209 (2014).
Article PubMed Google Scholar
Neggers, S. F. W., Van der Lubbe, R. H. J., Ramsey, N. F. & Postma, A. Interactions between ego- and allocentric neuronal representations of space. Neuroimage 31, 320–331 (2006).
Article CAS PubMed Google Scholar
Rodriguez, P. F. Neural decoding of goal locations in spatial navigation in humans with fMRI. Hum. Brain Mapp. 31, 391–397 (2010).
Article PubMed Google Scholar
Sherrill, K. R. et al. Functional connections between optic flow areas and navigationally responsive brain regions during goal-directed navigation. Neuroimage 118, 386–396 (2015).
Article PubMed Google Scholar
Spiers, H. J. & Maguire, E. A. A navigational guidance system in the human brain. Hippocampus 17, 618–626 (2007).
Article PubMed PubMed Central Google Scholar
Schwartenbeck, P., FitzGerald, T. H. B. & Dolan, R. Neural signals encoding shifts in beliefs. Neuroimage 125, 578–586 (2016).
Article PubMed Google Scholar
O’Reilly, J. X. et al. Dissociable effects of surprise and model update in parietal and anterior cingulate cortex. Proc. Natl Acad. Sci. USA 110, E3660–E3669 (2013).
Article PubMed PubMed Central Google Scholar
Boorman, E. D., Rajendran, V. G., O’Reilly, J. X. & Behrens, T. E. Two anatomically and computationally distinct learning signals predict changes to stimulus-outcome associations in Hippocampus. Neuron 89, 1343–1354 (2016).
Article CAS PubMed PubMed Central Google Scholar
Hunt, L. T. et al. Triple dissociation of attention and decision computations across prefrontal cortex. Nat. Neurosci. 21, 1471–1481 (2018).
Article CAS PubMed PubMed Central Google Scholar
Starkweather, C. K., Gershman, S. J. & Uchida, N. The medial prefrontal cortex shapes dopamine reward prediction errors under state uncertainty. Neuron 98, 616–629.e6 (2018).
Article CAS PubMed PubMed Central Google Scholar
Shenhav, A., Straccia, M. A., Musslick, S., Cohen, J. D. & Botvinick, M. M. Dissociable neural mechanisms track evidence accumulation for selection of attention versus action. Nat. Commun. 9, 2485 (2018).
Article PubMed PubMed Central Google Scholar
Taren, A. A., Venkatraman, V. & Huettel, S. A. A parallel functional topography between medial and lateral prefrontal cortex: evidence and implications for cognitive control. J. Neurosci. 31, 5026 (2011).
Article CAS PubMed PubMed Central Google Scholar
Venkatraman, V., Rosati, A. G., Taren, A. A. & Huettel, S. A. Resolving response, decision, and strategic control: evidence for a functional topography in dorsomedial prefrontal cortex. J. Neurosci. 29, 13158 (2009).
Article CAS PubMed PubMed Central Google Scholar
Zarr, N. & Brown, J. W. Hierarchical error representation in medial prefrontal cortex. Neuroimage 124, 238–247 (2016).
Article PubMed Google Scholar
Holroyd, C. B. & McClure, S. M. Hierarchical control over effortful behavior by rodent medial frontal cortex: a computational model. Psychol. Rev. 122, 54–83 (2015).
Article PubMed Google Scholar
Shenhav, A., Botvinick, M. M. & Cohen, J. D. The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron 79, 217–240 (2013).
Article CAS PubMed PubMed Central Google Scholar
Vassena, E., Holroyd, C. B. & Alexander, W. H. Computational models of anterior cingulate cortex: at the crossroads between prediction and effort. Front. Neurosci. 11, 316 (2017).
Article PubMed PubMed Central Google Scholar
de Wit, S. et al. Corticostriatal connectivity underlies individual differences in the balance between habitual and goal-directed action control. J. Neurosci. 32, 12066 (2012).
Article PubMed PubMed Central Google Scholar
Tricomi, E., Balleine, B. W. & O’Doherty, J. P. A specific role for posterior dorsolateral striatum in human habit learning. Eur. J. Neurosci. 29, 2225–2232 (2009).
Article PubMed PubMed Central Google Scholar
Horga, G. et al. Changes in corticostriatal connectivity during reinforcement learning in humans. Hum. Brain Mapp. 36, 793–803 (2015).
Article PubMed Google Scholar
Wan Lee, S., Shimojo, S. & O’Doherty, J. P. Neural computations underlying arbitration between model-based and model-free learning. Neuron 81, 687 (2014).
Article Google Scholar
Duan, L. Y. et al. Controlling one’s world: identification of sub-regions of primate PFC underlying goal-directed behavior. Neuron 109, 2485 (2021).
Article CAS PubMed PubMed Central Google Scholar
Fujimoto, A. et al. Signaling incentive and drive in the primate ventral pallidum for motivational control of goal-directed action. J. Neurosci. 39, 1793–1804 (2019).
Article CAS PubMed PubMed Central Google Scholar
Turner, K. M., Svegborn, A., Langguth, M., McKenzie, C. & Robbins, T. W. Opposing roles of the dorsolateral and dorsomedial striatum in the acquisition of skilled action sequencing in Rats. J. Neurosci. 42, 2039–2051 (2022).
Article CAS PubMed PubMed Central Google Scholar
Gremel, C. M. & Costa, R. M. Orbitofrontal and striatal circuits dynamically encode the shift between goal-directed and habitual actions. Nat. Commun. 4, 2264 (2013).
Article PubMed Google Scholar
Peirce, J. et al. PsychoPy2: experiments in behavior made easy. Behav. Res. Methods 51, 195–203 (2019).
Article PubMed PubMed Central Google Scholar
R. Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing: Vienna, Austria, 2017).

Download references

Acknowledgements

This study was supported by a project (No. P20006) subsidised by the New Energy and Industrial Technology Development Organization (NEDO) and by JSPS KAKENHI (No. 22H04998 and 23H04676), Japan. W.Y. was funded by MRC/Versus Arthritis (MR/W027593/1) and the Wellcome Trust (203139/Z/16/Z and 203139/A/16/Z), and the NIHR Oxford Health Biomedical Research Centre (views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care). R.K. was funded by JST, the establishment of a university fellowship toward the creation of science technology innovation (Grant Number JPMJFS2123), Japan. The funding agencies had no role in the study design, data collection and analysis, decision to publish or preparation of the manuscript. The authors thank B. Seymour for invaluable comments to improve this study.

Author information

Authors and Affiliations

Graduate School of Informatics, Kyoto University, Kyoto, 606-8501, Japan
Risa Katayama, Ryo Shiraki & Shin Ishii
Department of AI-Brain Integration, Advanced Telecommunications Research Institute International, Kyoto, 619-0288, Japan
Risa Katayama
Neural Information Analysis Laboratories, Advanced Telecommunications Research Institute International, Kyoto, 619-0288, Japan
Shin Ishii
International Research Center for Neurointelligence, the University of Tokyo, Tokyo, 113-0033, Japan
Shin Ishii
Department of Neural Computation for Decision-Making, Advanced Telecommunications Research Institute International, Kyoto, 619-0288, Japan
Wako Yoshida
Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX3 9DU, UK
Wako Yoshida

Authors

Risa Katayama
View author publications
You can also search for this author in PubMed Google Scholar
Ryo Shiraki
View author publications
You can also search for this author in PubMed Google Scholar
Shin Ishii
View author publications
You can also search for this author in PubMed Google Scholar
Wako Yoshida
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.I. conceived the project; R.K., R.S., S.I. and W.Y. designed the research; R.S. performed the experiments; R.K. analysed the data; R.K. wrote the draft; and R.K., S.I. and W.Y. prepared the final manuscript.

Corresponding author

Correspondence to Risa Katayama.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Biology thanks Zhe Li and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Benjamin Bessieres. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Peer review file

Supplementary information

Reporting summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Katayama, R., Shiraki, R., Ishii, S. et al. Belief inference for hierarchical hidden states in spatial navigation. Commun Biol 7, 614 (2024). https://doi.org/10.1038/s42003-024-06316-0

Download citation

Received: 22 October 2023
Accepted: 10 May 2024
Published: 21 May 2024
DOI: https://doi.org/10.1038/s42003-024-06316-0
Springer Nature Limited

Belief inference for hierarchical hidden states in spatial navigation

Abstract

Similar content being viewed by others

Confidence modulates the decodability of scene prediction during partially-observable maze exploration in humans

Bayesian decision theory and navigation

Planning and navigation as active inference

Introduction

Results

Behavioural results

Brain activity involved in processing the action feedback

Neural correlates involved in the uncertainties during the hierarchical inference

Discussion

Methods

Subjects

Tiger maze navigation task

Game score

Training task

Behavioural model

Alternative models

Parameter estimation and model validation

Image acquisition and analysis

Statistics and reproducibility

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Peer review file

Supplementary information

Reporting summary

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation