# Generalised free energy and active inference

- 1.5k Downloads
- 1 Citations

## Abstract

Active inference is an approach to understanding behaviour that rests upon the idea that the brain uses an internal generative model to predict incoming sensory data. The fit between this model and data may be improved in two ways. The brain could optimise probabilistic beliefs about the variables in the generative model (i.e. perceptual inference). Alternatively, by acting on the world, it could change the sensory data, such that they are more consistent with the model. This implies a common objective function (variational free energy) for action and perception that scores the fit between an internal model and the world. We compare two free energy functionals for active inference in the framework of Markov decision processes. One of these is a functional of beliefs (i.e. probability distributions) about states and policies, but a function of observations, while the second is a functional of beliefs about all three. In the former (*expected* free energy), prior beliefs about outcomes are not part of the generative model (because they are absorbed into the prior over policies). Conversely, in the second (*generalised* free energy), priors over outcomes become an explicit component of the generative model. When using the free energy function, which is blind to future observations, we equip the generative model with a prior over policies that ensure preferred (i.e. priors over) outcomes are realised. In other words, if we expect to encounter a particular kind of outcome, this lends plausibility to those policies for which this outcome is a consequence. In addition, this formulation ensures that selected policies minimise uncertainty about future outcomes by minimising the free energy expected in the future. When using the free energy functional—that effectively treats future observations as hidden states—we show that policies are inferred or selected that realise prior preferences by minimising the free energy of future expectations. Interestingly, the form of posterior beliefs about policies (and associated belief updating) turns out to be identical under both formulations, but the quantities used to compute them are not.

## Keywords

Bayesian Active inference Free energy Data selection Epistemic value Intrinsic motivation## 1 Introduction

*planning as inference*scheme (Attias 2003; Baker et al. 2009; Botvinick and Toussaint 2012; Verma and Rao 2006) has a pleasingly broad explanatory scope, accounting for a range of phenomena in cognitive neuroscience, active vision and motor control (see Table 1). In this paper, we revisit the role of (expected) free energy in active inference and offer an alternative, simpler and more general formulation. This formulation does not substantially change the message passing or belief updating; however, it provides an interesting perspective on planning as inference and the way that we may perceive the future.

Applications of active inference for Markov decision processes

Application | Comment | References |
---|---|---|

Decision making under uncertainty | Initial formulation of active inference for | Friston et al. (2012c) |

Optimal control (the mountain car problem) | Illustration of | Friston et al. (2012a) |

Evidence accumulation: Urns task | Demonstration of how beliefs states are absorbed into a generative model | |

Addiction | Application to psychopathology | Schwartenbeck et al. (2015c) |

Dopaminergic responses | Associating dopamine with the encoding of (expected) precision provides a plausible account of dopaminergic discharges | |

Computational fMRI | Using Bayes optimal precision to predict activity in dopaminergic areas | Schwartenbeck et al. (2015a) |

Choice preferences and epistemics | Empirical testing of the hypothesis that people prefer to keep options open | Schwartenbeck et al. (2015b) |

Behavioural economics and trust games | Examining the effects of prior beliefs about self and others | |

Foraging and two-step mazes; navigation in deep mazes | Formulation of epistemic and pragmatic value in terms of | Friston et al. (2015) |

Habit learning, reversal learning and devaluation | Learning as minimising variational free energy with respect to model parameters—and action selection as | |

Saccadic searches and scene construction |
| |

Electrophysiological responses: | Simulating neuronal processing with a gradient descent on variational free energy, c.f., dynamic | Friston et al. (2017a) |

Structure learning, sleep and insight | Inclusion of parameters into expected free energy to enable structure learning via | Friston et al. (2017b) |

Narrative construction and reading | Hierarchical generalisation of generative model with | |

Computational neuropsychology | Simulation of visual neglect, hallucinations and prefrontal syndromes under alternative pathological priors | Benrimoh et al. (2018), Parr and Friston (2017a), Parr et al. (2018a, b, 2019) |

Neuromodulation | Use of precision parameters to manipulate exploration during saccadic searches; associating uncertainty with cholinergic and noradrenergic systems | Parr and Friston (2017b, 2019), Sales et al. (2018), Vincent et al. (2019) |

Decisions to movements | Hybrid continuous and discrete generative models to implement decisions through movement | |

Planning, navigation and niche construction | Agent-induced changes in environment (generative process); decomposition of goals into subgoals |

In current descriptions of active inference, the basic argument goes as follows: active inference is based upon the maximisation of model evidence or minimisation of variational free energy in two complementary ways. First, one can update one’s beliefs about latent or hidden states of the world to make them consistent with observed evidence—or one can actively sample the world to make observations consistent with beliefs about states of the world. The important thing here is that both action and perception are in game of minimising the same quantity, namely variational free energy. A key aspect of this formulation is that action (i.e. behaviour) is absorbed into inference, which means that agents have beliefs about what they are doing—and will do. This calls for prior beliefs about action or policies (i.e. sequences of actions). So where did these prior beliefs come from?

The answer obtains from a *reductio ad absurdum* argument: if action realises prior beliefs and minimises free energy, then the only tenable prior beliefs are that action will minimise free energy. If this were not the case, we reach the following absurd conclusion. If a free energy minimising creature did not have the prior belief that it selects policies that minimise (expected) free energy, it would infer (and therefore pursue) policies that were not free energy minimising. As such, it would not be a free energy minimising creature, which is a contradiction. This leads to the prior belief that I will select policies that minimise the free energy expected under that policy. The endpoint of this argument is that action or *policy selection becomes a form of Bayesian model selection*, where the evidence for a particular policy becomes the free energy expected in the future. This *expected free energy* is a slightly unusual objective function because it scores the evidence for plausible policies based on outcomes that have yet to be observed. This means that the expected free energy becomes the variational free energy expected under (posterior predictive) beliefs about outcomes. These priors are usually informed by prior beliefs about outcomes that play the role of prior preferences or utility functions in reinforcement learning and economics.

In summary, beliefs about states of the world and policies are continuously updated to minimise variational free energy, where posterior beliefs about policies (that prescribe action) are based upon expected free energy (that may or may not include prior preferences over future outcomes). This is the current story and leads to interesting issues that rest on the fact that expected free energy can be decomposed into epistemic and pragmatic parts (Friston et al. 2015). This decomposition provides a principled explanation for the epistemics of planning and inference that underwrite the exploitation and exploration dilemma, novelty, salience and so on. However, there is another way of telling this story that leads to a conceptually different sort of interpretation.

In what follows, we show that the same Bayesian policy (model) selection obtains from minimising variational free energy when *future outcomes are treated as hidden or latent states of the world*. In other words, we can regard active inference as minimising a generalised free energy under generative models that entertain the consequences of (policy-dependent) hidden states of the world in the future. This simple generalisation induces posterior beliefs over future outcomes that now play the role of latent or hidden states. In this setting, the future is treated in exactly the same way as the hidden or unobservable states of the world generating observations in the past. On this view, one gets the expected free energy for free, because the variational free energy involves an expectation under posterior beliefs over future outcomes. In turn, this means that beliefs about states and policies can be simply and uniformly treated as minimising the same (generalised) free energy, without having to invoke any free energy minimising priors over policies.

Technically, this leads to the same form of belief updating and (Bayesian) policy selection but provides a different perspective on the free energy principle per se. This perspective says that self-evidencing and active inference both have one underlying imperative, namely to minimise *generalised free energy* or uncertainty. When this uncertainty is evaluated under models that generate outcomes in the future, future outcomes become hidden states that are only revealed by the passage of time. In this context, outcomes in the past become observations in standard variational inference, while outcomes in the future become posterior beliefs about latent observations that have yet to disclose themselves. In this way, the generalised free energy can be seen as comprising variational free energy contributions from the past and future.

The current paper provides the formal basis for the above arguments. In brief, we will see that both the expected and generalised free energy formulations lead to the same update equations. However, there is a subtle difference. In the expected free energy formalism, prior preferences or beliefs about outcomes are used to specify the prior over policies. In the generalised formulation, prior beliefs about outcomes in the future inform posterior beliefs about the hidden states that cause them. Because of the implicit forward and backward message passing in the belief propagation scheme obtained at the free energy minimum (Yedidia et al. 2005), these prior beliefs or preferences act to distort expected trajectories (into the future) towards preferences in an optimistic way (Sharot et al. 2012). Intuitively, the expected free energy contribution to generalised free energy evaluates the (complexity) cost of this distortion, thereby favouring policies that lead naturally to preferred outcomes—without violating beliefs about state transitions and the (likelihood) mapping between states and outcomes. The implicit coupling between beliefs about the future and current actions means that, in one sense, the future can cause the past.

Framing probabilistic reasoning in terms of inferential message passing formalises several prominent concepts in the study of human decision making. The idea that prior beliefs distort beliefs about future and that this optimism about the future propagates backwards in time to influence behaviour in an adaptive way (McKay and Dennett 2010; Sharot 2011), is highly consistent with an influence of beliefs about the future over beliefs about the present. Simplistically, the idea behind these accounts is that adaptive behaviour relies upon the (possibly false) belief that future events will accord with our preferences. It is only by believing that we will realise these goals that we act in a manner consistent with their realisation. Intuitively, without the belief that we will end up eating dinner, there would be no reason to shop for ingredients. The passing of messages from past to future resonates with the notion that working memory is vital for predicting the future and planning actions accordingly (Gilhooly 2005; Hikosaka et al. 2000), and underwrites research on episodic future thinking and counterfactual reasoning (Schacter et al. 2015). Appealing to bidirectional inferential message passing has enabled us to reproduce a range of behavioural and electrophysiological phenomena through simulation (summarised in Table 1).

This paper comprises three sections. In the first, we outline the approach we have used to date (i.e. minimising the variational free energy under prior beliefs that policies with a low expected free energy are more probable). In the second, we introduce a generalisation of the variational free energy that incorporates beliefs about future outcomes. The third section compares these two approaches conceptually and through illustrative simulations.

## 2 Active inference and variational free energy

*s*) that evolve through time. At each time step, the probability of transitioning from one state to the next depends upon a policy (

*π*). Neither states nor policies are directly accessible to the creature in question. However, each state probabilistically generates an observable outcome (

*o*). As Jensen’s inequality demonstrates, free energy is an upper bound on surprise.

In the equation above, *P* indicates a probability distribution over outcomes \( \tilde{o} = (o_{1} ,o_{2} , \ldots ,o_{T} ) \) that are generated by hidden states of the world \( \tilde{s} = (s_{1} ,s_{2} , \ldots ,s_{T} ) \) and policies, which define the generative model. The generative model is thus expressed as a joint probability distribution over outcomes (i.e. consequences) and their causes (i.e. hidden states of the world and policies available to the agent). Marginalising (i.e. summing or integrating) over the states and policies gives the evidence (a.k.a., marginal likelihood). The log of this marginal likelihood is negative surprise. *Q* is a probability distribution over unobservable (hidden) states and policies—that becomes an approximate posterior distribution as free energy is minimised. The minimisation of free energy over time ensures entropy does not increase, thereby enabling biological systems to resist the second law of thermodynamics and their implicit dissipation or decay.

Note that the generative model is not a model of the biological system itself, but an implicit model of how the environment generates its sensory data. The dynamics of inference and behaviour that we are interested in here emerge from minimising free energy under an appropriate choice of generative model. For readers with a physics background, and analogy would be that the free energy plays the role of a Lagrangian whose ‘potential energy’ component is given by the generative model. Just as a Lagrangian is used to recover the equations of motion for a physical system, we use the free energy to recover the belief updates that determine a biological system’s behaviour.

In the following, we begin by describing the form of the generative model we have used to date. We will then address the form of the approximate posterior distribution. To make inference tractable, this generally involves a mean-field approximation that factorises the approximate posterior distribution into independent factors or marginal distributions.

The role of the generative model is simply to define the free energy functional which, as we will see in Sects. 2.3–2.5, gives rise to the belief update rules that we will employ for our simulations. However, it is helpful to imagine how we might generate data from such a model. We outline this process with the model on the left of Fig. 1 in mind. We could start at the first time step and sample a state from the categorical prior over initial states. The parameters of this prior (**D**) are simply a vector of probabilities for each alternative state. From this, we can now sample from the likelihood. This is formulated as a matrix (**A**), whose columns correspond to a state and whose rows are the alternative outcomes that may be generated. To generate an outcome, we would select the column of this matrix corresponding to the state we sampled and sample an outcome from this column-vector of probabilities. It is this outcome that would be available to a synthetic creature.

Taking a discrete time step into the future, we can sample a new state from the column of a transition matrix (**B**) associated with the state at the previous time. Crucially, the transition probabilities are conditioned upon the selected action. This means we have a separate **B**-matrix for each action. Action selection depends upon the policy, with each policy and time point associated with an action. For the model on the left of Fig. 1, this means we calculate the expected free energy (**G**) for each policy, which depends upon a vector of prior probabilities for outcomes under these policies (**C**). Combining these with a prior bias term (**E**)—as set out in more detail in Sect. 2.4—we can construct a prior over policies. Sampling from this and selecting the action that corresponds to this policy, at this time, specify the **B**-matrix from which to sample the state for the current time step. We could then sample the outcome for this time from the relevant column of the **A**-matrix. This process can be repeated for a series of discrete time steps, generating a new outcome for each time. A similar approach could be taken to generate data from the model on the right of Fig. 1. However, note that the likelihood here comprises both **A** and **C**, and the policy prior only includes **E** (i.e. the expected free energy does not explicitly feature in this model). The procedure outlined above provides an intuition into the beliefs a creature has about how its sensory data are generated by acting on hidden states in the environment.

It is worth noting that the free energy is a *functional* of the distributions in the generative model and of the approximate posterior beliefs, but a *function* of observations. Continuing with this free energy, we now consider the mean-field approximation in current implementations of active inference, and its consequences for the variational free energy. In the next few sections, we unpack the variational free energy, and its role in active inference based on Markov decision processes. The argument that follows is a little involved, but we summarise the key steps here, such that the agenda of each of the following sections is clear. In Sect. 2.1, we specify the form of the variational distribution we employ, and the free energy that results from this. In Sect. 2.2, we unpack the terms in the free energy as they pertain to the generative model. This depends upon having a prior belief about policies. Section 2.3 attempts to identify this prior, through finding the optimal posterior and extrapolating backwards in time. This highlights a shortcoming of this approach that is resolved in Sect. 2.4. In addition to providing a more appropriate prior for policy selection, Sect. 2.4 sets out the role of free energy in simulating behaviour. In brief, this involves finding the variational distribution over policies that minimises free energy. As free energy is a function of sensory observations, this means we need to update these distributions following each new observation. Section 2.5 follows the same approach to find the free energy minima for beliefs about states, giving the fixed points to which these distributions must be updated following each new observation.

### 2.1 Definition of the mean-field variational free energy

*P*), which include the interactions, under the fully factorised distribution (

*Q*). The advantage to using a mean-field approximation is the computational tractability that comes from being able to separately optimise each marginal distribution. We can now substitute this factorised distribution into our definition for the variational free energy above:

In this form, the variational free energy is expressed in terms of policy-dependent terms (second equality) that bound the (negative log) evidence for each policy and a complexity cost or KL divergence^{1} (*D*_{KL}) that scores the departure of the posterior beliefs over policies from the corresponding prior beliefs.

### 2.2 Past and future

*beliefs*about states. In other words, the free energy is a functional of distributions over states, rather than a function, as in the case of outcomes. This means that free energy evaluation takes account of future states. We can express this explicitly by writing the variational free energy, at time

*t*, as a sum over all time steps, factorising the generative distribution according to the conditional independencies expressed in Fig. 1:

*t*) the free energy should, strictly speaking, be written as a function of

*t*and

*τ*. As we are interested here in online inference, we will assume an implicit conditioning upon

*t*for all free energies throughout this paper. The Iverson brackets above allow us to decompose the sum into past and future components:

In this decomposition, the contribution of beliefs about future states reduces to a complexity cost. This is the KL divergence between approximate posterior beliefs about states in the future and prior beliefs. The latter are based upon the (policy-specific) transition probabilities in the generative model.

### 2.3 Policy posteriors and priors

### 2.4 Expected free energy

*reductio ad absurdum*in the introduction and is expressed mathematically as:

*Q*

_{o}(

*π*) for the prior here to distinguish this from the fixed form prior

*P*(

*π*), which does not depend on the beliefs about states.)

*G*

_{π}is the expected free energy, conditioned on a policy. It is defined as:

This form shows that policies that have a low expected free energy are those that resolve uncertainty, and that fulfil prior preferences about outcomes. It is the first of these terms that endorses the metaphor of the brain as a scientist, performing experiments (i.e. actions with sensory consequences) to verify or refute hypotheses about the world (Friston et al. 2012b; Gregory 1980). The second term speaks to the notion of a ‘crooked scientist’ (Bruineberg et al. 2016), who designs experiments to confirm prior beliefs, i.e. preferred outcomes. This preference is the same as the evidence (a.k.a., marginal likelihood) associated with a given model. This means policies are selected such that the most probable outcomes under that policy match the most probable outcomes under prior preferences (defined in terms of a marginal likelihood).

In this equation, *H* is the Shannon entropy (i.e. negative expected log probability). This means that the prior belief about outcomes enters the generative model through the KL divergence between outcomes expected under any policy and prior preferences. This form also illustrates the correspondence between the expected free energy and the quantities ‘risk’ and ‘ambiguity’ from behavioural economics (Ellsberg 1961; Ghirardato and Marinacci 2002). Risk quantifies the expected cost of a policy as a divergence from preferred outcomes and is sometimes referred to as Bayesian risk or regret (Huggins and Tenenbaum 2015), which underlies KL control and related Bayesian control rules (Kappen et al. 2012; Ortega and Braun 2010; Todorov 2008) and special cases that include Thompson sampling (Lloyd and Leslie 2013; Strens 2000). Ambiguous states are those that have an uncertain mapping to observations. The greater these quantities, the less likely it is that the associated policy will be chosen.

This highlights the way in which the expected free energy influences policy selection. Distributions over policies are updated at each time step to a fixed point that depends upon the expected free energy. The expected free energy is a functional of posterior beliefs about states. Section 2.5 sets out how these may be optimised in relation to sensory outcomes.

### 2.5 Hidden state updates

Variables in update equations

Variable | Definition |
---|---|

\( {\mathbf{F}} = \left[ { \ldots ,F_{\pi } , \ldots } \right]^{T} \) | Variational free energy |

\( {\mathbf{G}} = [ \ldots ,G_{\pi } , \ldots ]^{T} \) | Expected free energy |

\( {\boldsymbol{\mathcal{F}}} = [ \ldots ,\mathcal{F}_{\pi } , \ldots ]^{T} \) | Generalised free energy |

\( \begin{aligned} & {\varvec{\uppi}}_{{\mathbf{o}}} ;\,{\varvec{\uppi}}_{{{\mathbf{o}}i}} = Q_{0} (\pi = i) \\ & {\varvec{\uppi}};\,{\varvec{\uppi}}_{i} = Q(\pi = i) \\ \end{aligned} \) | Policy prior and posterior |

\( {\mathbf{s}}_{\pi \tau } ;\,{\mathbf{s}}_{\pi \tau i} = Q(s_{\tau } = i|\pi ) \) | State belief (for a given policy and time) |

\( {\mathbf{o}}_{\pi \tau } ;\,{\mathbf{o}}_{\pi \tau i} = Q(o_{\tau } = i|\pi ) \) | Outcome belief (for a given policy and time) |

\( o_{\tau } \) | Outcome |

\( {\mathbf{A}};\,{\mathbf{A}}_{ij} = P(o_{\tau } = i|s_{\tau } = j) \) | Likelihood matrix (mapping states to outcomes) |

\( {\mathbf{B}};\,{\mathbf{B}}_{\pi \tau ij} = P(s_{\tau + 1} = i|s_{\tau } = j,\pi ) \) | Transition matrix (mapping states to states) |

\( {\mathbf{C}};\,{\mathbf{C}}_{\tau i} = P(o_{\tau } = i) \) | Outcome prior |

\( {\mathbf{E}};\,{\mathbf{E}}_{i} = P(\pi = i) \) | Fixed form policy prior |

\( {\mathbf{H}};\,{\mathbf{H}}_{i} = \sum\limits_{j} {P(o_{\tau } = j|s_{\tau } = i)\ln P(o_{\tau } = j|s_{\tau } = i)} \) | Entropy of the likelihood mapping |

### 2.6 Summary

In the above, we have provided an overview of our approach to date. This uses a variational free energy functional to derive belief updates, while policy selection is performed based on an expected free energy. The resulting update equations are shown in Fig. 3 (blue panels). This formulation has been very successful in explaining a range of cognitive functions, as summarised in Table 1. In the following, we present an alternative line of reasoning. As indicated in Fig. 2, there is more than one way to think about the data assimilation and evidence accumulation implicit in this formulation. So far, we have considered the addition of new observations as time progresses. We now consider the case in which (future) outcomes are represented throughout time. This means that future or latent outcomes have the potential to influence beliefs about past states.

## 3 Active inference and generalised free energy

The *δ* here is a Kronecker delta function (a discrete version of a Dirac delta) that is one when the arguments are equal, and zero otherwise. The starred (*) argument indicates the data we have actually observed. In the generalised free energy, the marginals of the joint distribution over outcomes and states define the entropy but the expectation is over the joint distribution. It is important to note that \( Q(o_{\tau } ,s_{\tau } |\pi ) \ne Q(o_{\tau } |\pi )Q(s_{\tau } |\pi ) \). It is this inequality that underlies the epistemic components of generalised free energy. Interestingly, if we assumed conditional independence between outcomes and hidden states, \( Q(o_{\tau } ,s_{\tau } |\pi ) = Q(o_{\tau } |\pi )Q(s_{\tau } |\pi ) \), the resulting belief update equations would correspond exactly to a variational message passing algorithm (Dauwels 2007) applied to a model with missing data.

Here, we have defined the distribution over states and observations in terms of two independent factors, a likelihood and a prior over observations, i.e. preferred observations conditioned on the model. For simplicity, we will omit the explicit conditioning on \( m \), so that \( P(\tilde{o}|m) = P(\tilde{o}) \). This quantity plays exactly the same role as that of the preferences in the formulation described in the previous section. However, while it has the same influence over policy selection, it can no longer be interpreted as model evidence. Instead, it is a policy-independent prior that contributes to the evidence.

To obtain the mutual information term, we have used the relationship \( \ln P(o_{\tau } |s_{\tau } ) = \ln Q(o_{\tau } |s_{\tau } ) = \ln Q(o_{\tau } ,s_{\tau } |\pi ) - \ln Q(s_{\tau } |\pi ) \). The imperative to maximise the mutual information (Barlow 1961, 1974; Linsker 1990; Optican and Richmond 1987) can be interpreted as an epistemic drive (Denzler and Brown 2002). This is because policies that (are believed to) result in observations that are highly informative about the hidden states are associated with a lower generalised free energy. As a KL divergence is always greater than or equal to zero, the second equality indicates that the free energy of the expected future is an upper bound on expected surprise.

The final term for future beliefs implies that future states are considered more probable if they are expected to be similar to those that generate preferred outcomes. In other words, there is an optimistic distortion of beliefs about the trajectory into the future.

### 3.1 Summary

We have introduced a generalised free energy functional that is expressed as a functional of beliefs about data. The variational free energy can be seen as a special case of this generalised functional, when beliefs about outcomes collapse to delta functions. When we derive update equations (Fig. 3, pink panels) under this functional, the updates look very similar to those based on the variational free energy approach. An important difference between the two approaches is that we have now included the prior probability of outcomes in the generative model. This has no influence over beliefs about the past, but distorts beliefs about the future in an optimistic fashion. This formulation generalises not only the standard active inference formalism, but also active data selection or sensing approaches in machine learning (MacKay 1992) and computational neuroscience (Yang et al. 2016b). See “Appendix A” for a discussion of the relationship between these.

## 4 Comparison of active inference under expected and generalised free energy

The generalised free energy has the appeal that belief updating and policy selection both minimise the same objective function. In contrast, formulations of active inference to date have required two different quantities (the variational free energy and the expected free energy, respectively) to derive these processes. Although the form of belief updating is the same, the belief updates resulting from the use of a generalised free energy are different in subtle ways. In this section, we will explore these differences and show how generalised active inference reproduces the behaviours illustrated in our earlier papers.

The notable differences between the updates are found in the policy prior, the treatment of outcomes and the future hidden state updates. The prior over policies is very similar in both formulations. The expected and generalised free energy (at \( \tau = 0 \)) differ only in that there is an additional complexity term in the latter. This has a negligible influence on behaviour, as the first action is performed *after* observations have been made at the first time step. At this point, the posterior belief about policies is identical, as the variational free energy supplies the missing complexity term. Although the priors are different, in both form and motivation, the posterior beliefs turn out to be computed identically. Any difference in these can be attributed to the quantities used to calculate them, namely the outcomes and the hidden states.

Outcomes in the generalised formulation are represented explicitly as beliefs. This means that the prior over outcomes is incorporated explicitly in the generative model. There are two important consequences of this. The first is that the posterior beliefs about future outcomes (i.e. the probability of future outcomes given those already observed) can be derived in a parsimonious way, without the need to define additional prior distributions. The second is that hidden state beliefs in the future are biased towards these preferred outcomes. A prior belief about an outcome at a particular time point thus distorts the trajectory of hidden states at each time point reaching back to the present. In addition to this, beliefs about hidden states in the future acquire an ‘ambiguity’ term. This means that states associated with an imprecise mapping to sensory outcomes are believed less likely to be inferred. In summary, not only are belief trajectories drawn in optimistic directions, they also tend towards states that offer informative observations.

^{2}(rewarding) stimulus. In another, there is no stimulus, and this condition is considered aversive. In the final arm, there is always an instructional or conditioned stimulus that indicates the arm that contains the reward. There are two possible contexts for the maze. The first is that where the unconditioned stimulus is in the left arm and the second where it is in the right arm. The starting location and the location of the conditioned stimulus are neither aversive nor rewarding. Under each of the schemes illustrated here, the degree to which a stimulus is rewarding is expressed in terms of the prior preference (i.e.

**C**). In other words, we can think of reward as the log probability of a given observation. The more probable an outcome is considered to be, the more attractive it appears to be. This is because policies that do not lead to these outcomes violate prior beliefs and are unlikely to be selected a posteriori. Please see “Appendix A” (term 4) for an interpretation of this that appeals to expected utility theory and risk aversion. There is an important distinction here between schemes based upon Bellman optimality and the scheme on offer here. This is that active inference depends upon probabilistic beliefs and does not assume direct access to knowledge about states of the world. Practically, this means that the agent has no direct access to the hidden states, but must infer them based upon the (observable) outcomes. The importance of this is that the information gain associated with an exploratory behaviour can be quantified by the change in beliefs (or uncertainty reduction) that this behaviour facilitates.

As Fig. 4 shows, regardless of the active inference scheme we use, the agent first samples the unrewarding, but epistemically valuable, uncertainty resolving cue location. This entails moving from the initial location in the centre of the maze, where the agent is uncertain about the context, to the location with the conditioned stimulus. To have made the decision to make this move, the agent updated its beliefs about states of the world (**s**_{πτ}) in relation to the outcomes (*o*_{1}) available in the central location using the fixed-point solutions shown in the ‘hidden states’ panels of Fig. 3. It does so for beliefs about every time point from the start to the end of the (four step) planning horizon. As these belief updates were derived by finding the free energy minima, this means these belief updates necessarily minimise free energy. Once beliefs have been optimised, they may be used to compute the expected free energy (or the corresponding part of the generalised free energy) as in the ‘free energies’ panel of Fig. 3. These are then used to update beliefs about policies as in the ‘policy selection’ panel. In computing these free energies, we required a posterior predictive belief about outcomes, which can be obtained using the likelihood probabilities to project beliefs about states to beliefs about outcomes (‘outcomes’ panel of Fig. 3). Given that the context unambiguously determines the conditioned stimulus and that our agent is initially uncertain about the context, the greatest information gain (and therefore smallest expected or generalised free energy) is associated with policies that sample this cue location.

On reaching the conditioned stimulus and observing the green conditioned stimulus (*o*_{2}), the agent again updates beliefs about states to their new fixed point. Here, the free energy minimum corresponds to the belief that the second context (with the unconditioned stimulus in the right arm) is in play. Having resolved uncertainty about the context of the maze, the agent proceeds to maximise its extrinsic reward by moving to the reward location and finding the unconditioned stimulus (*o*_{3}). This is consistent with the smaller expected and generalised free energies associated with policies that realise prior beliefs about outcomes (**C** in the ‘free energies’ panels of Fig. 3).

Taken together, Figs. 4 and 5 illustrate an interesting feature of the generalised formulation. Although subtle, at *t *=1, beliefs about location at *τ* = 2 are different, as shown in Fig. 5. Specifically, locations 2 and 3 appear slightly more probable, at the expense of location 4. This illustrates that beliefs about the proximal future are distorted by beliefs about future outcomes. Similarly, at *t *=2, the generalised scheme considers it more likely that it will transition to location 4 relative to the variational scheme. Referring back to Fig. 4, we see that this corresponds to an increased posterior probability for policy 10 at this time step. Here, beliefs about future states and outcomes have influenced beliefs about the plausibility of different behavioural options at the present. In this case, the agent believes that it will experience observations associated with states 2 and 3 in the distal future (*τ* = 4). This enhances the probability of being in states in the more proximal future that are consistent with transitions into states 2 or 3. As these are absorbing states (the probability of staying in those states, once occupied, is one), these states are highly consistent with a transition to themselves. This induces a belief that states 2 and 3 are more probable at time *τ* = 3. Note that, as there are other plausible states that could have transitioned into 2 and 3 at *τ* = 4, the probability of states 2 and 3 at *τ* = 3 is less than at *τ* = 4. The same reasoning explains the higher probability of 2 and 3 at *τ* = 2 (relative to the standard scheme), but with a lower probability relative to occupying these states at later times. If instead the agent believed there was a very low probability of ending up at the goal location, this would induce beliefs that those states that lead to these locations with high probability were themselves unlikely. Another way of putting this is that if I had strong beliefs about where I were to end up, I could infer where I might have been immediately before this. This will depend upon the relative probabilities of going from plausible penultimate locations to the goal location. By propagating these back to the present, I will infer that the most probable trajectory is the one that leads to this goal, and will act to fulfil my beliefs about this trajectory. In the absence of, possibly false, beliefs about where I would end up, I would not end up acting to fulfil these beliefs.

## 5 Conclusion

The generalised free energy introduced in this paper provides a new perspective on active inference. It unifies the imperatives to minimise variational free energy with respect to data, and expected free energy through model selection, under a single objective function. Like the expected free energy, this generalised free energy can be decomposed in several ways, giving rise to familiar information theoretic measures and objective functions in Bayesian reinforcement learning. Generalised free energy minimisation replicates the epistemic and reward seeking behaviours induced in earlier active inference schemes, but prior preferences now induce an optimistic distortion of belief trajectories into the future. This allows beliefs about outcomes in the distal future to influence beliefs about states in the proximal future and present. That these beliefs then drive policy selection suggests that, under the generalised free energy formulation, (beliefs about) the future can indeed cause the past.

## Footnotes

- 1.
The KL divergence (also known as relative entropy or information gain) is defined as follows: \( D_{\text{KL}} [Q(x)||P(x)] \triangleq E_{Q(x)} [\ln Q(x) - \ln P(x)] \).

- 2.
The terms

*conditioned stimulus*and*unconditioned stimulus*are used in the sense of classical (Pavlovian) conditioning paradigms.

## Notes

### Acknowledgements

TP is supported by the Rosetrees Trust (Award Number 173346). KJF is a Wellcome Principal Research Fellow (Ref: 088130/Z/09/Z). The authors thank Dimitrije Markovic for his insightful comments on the manuscript.

### Compliance with ethical standards

### Conflict of interest

The authors declare that they have no conflict of interest.

## References

- Attias H (2003) Planning by probabilistic inference. In: Proceedings of the 9th international workshop on artificial intelligence and statisticsGoogle Scholar
- Baker CL, Saxe R, Tenenbaum JB (2009) Action understanding as inverse planning. Cognition 113:329–349. https://doi.org/10.1016/j.cognition.2009.07.005 CrossRefPubMedGoogle Scholar
- Barlow H (1961) Possible principles underlying the transformations of sensory messages. In: Rosenblith W (ed) Sensory communication. MIT Press, Cambridge, pp 217–234Google Scholar
- Barlow HB (1974) Inductive inference, coding, perception, and language. Perception 3:123–134CrossRefGoogle Scholar
- Beal MJ (2003) Variational algorithms for approximate Bayesian inference. University of London, LondonGoogle Scholar
- Benrimoh D, Parr T, Vincent P, Adams RA, Friston K (2018) Active inference and auditory hallucinations computational psychiatry 2:183–204. https://doi.org/10.1162/cpsy_a_00022 CrossRefPubMedGoogle Scholar
- Botvinick M, Toussaint M (2012) Planning as inference. Trends Cognit Sci 16:485–488CrossRefGoogle Scholar
- Brown H, Friston KJ (2012) Free-energy and illusions: the Cornsweet effect. Front Psychol 3:43. https://doi.org/10.3389/fpsyg.2012.00043 CrossRefPubMedPubMedCentralGoogle Scholar
- Bruineberg J, Kiverstein J, Rietveld E (2016) The anticipating brain is not a scientist: the free-energy principle from an ecological-enactive perspective. Synthese. https://doi.org/10.1007/s11229-016-1239-1 CrossRefPubMedPubMedCentralGoogle Scholar
- Bruineberg J, Rietveld E, Parr T, van Maanen L, Friston KJ (2018) Free-energy minimization in joint agent-environment systems: a niche construction perspective. J Theor Biol 455:161–178. https://doi.org/10.1016/j.jtbi.2018.07.002 CrossRefPubMedPubMedCentralGoogle Scholar
- Daunizeau J, Preuschoff K, Friston K, Stephan K (2011) Optimizing experimental design for comparing models of brain function. PLOS Comput Biol 7:e1002280. https://doi.org/10.1371/journal.pcbi.1002280 CrossRefPubMedPubMedCentralGoogle Scholar
- Dauwels J (2007) On variational message passing on factor graphs. In: IEEE international symposium on information theory, ISIT 2007. IEEE, pp 2546–2550Google Scholar
- Dayan P, Hinton GE, Neal RM, Zemel RS (1995) The Helmholtz machine. Neural Comput 7:889–904CrossRefGoogle Scholar
- Denzler J, Brown CM (2002) Information theoretic sensor data selection for active object recognition and state estimation. IEEE Trans Pattern Anal Mach Intell 24:145–157. https://doi.org/10.1109/34.982896 CrossRefGoogle Scholar
- El-Gamal MA (1991) The role of priors in active bayesian learning in the sequential statistical decision framework. In: Grandy WT, Schick LH (eds) Maximum entropy and Bayesian methods: Laramie, Wyoming, 1990. Springer Netherlands, Dordrecht, pp 33–38. https://doi.org/10.1007/978-94-011-3460-6_3 CrossRefGoogle Scholar
- Ellsberg D (1961) Risk, ambiguity, and the savage axioms. Q J Econ 75:643–669. https://doi.org/10.2307/1884324 CrossRefGoogle Scholar
- FitzGerald T, Dolan R, Friston K (2014) Model averaging, optimal inference, and habit formation. Front Hum Neurosci. https://doi.org/10.3389/fnhum.2014.00457 CrossRefPubMedPubMedCentralGoogle Scholar
- FitzGerald TH, Dolan RJ, Friston K (2015a) Dopamine, reward learning, and active inference. Front Comput Neurosci 9:136. https://doi.org/10.3389/fncom.2015.00136 CrossRefPubMedPubMedCentralGoogle Scholar
- FitzGerald TH, Moran RJ, Friston KJ, Dolan RJ (2015b) Precision and neuronal dynamics in the human posterior parietal cortex during evidence accumulation. Neuroimage 107:219–228. https://doi.org/10.1016/j.neuroimage.2014.12.015 CrossRefPubMedPubMedCentralGoogle Scholar
- FitzGerald TH, Schwartenbeck P, Moutoussis M, Dolan RJ, Friston K (2015c) Active inference, evidence accumulation, and the urn task. Neural Comput 27:306–328. https://doi.org/10.1162/neco_a_00699 CrossRefPubMedGoogle Scholar
- Friston K (2003) Learning and inference in the brain. Neural Netw 16:1325–1352. https://doi.org/10.1016/j.neunet.2003.06.005 CrossRefPubMedGoogle Scholar
- Friston K, Buzsaki G (2016) The functional anatomy of time: what and when in the brain. Trends Cognit Sci. https://doi.org/10.1016/j.tics.2016.05.001 CrossRefGoogle Scholar
- Friston K, Kilner J, Harrison L (2006) A free energy principle for the brain. J Physiol-Paris 100:70–87. https://doi.org/10.1016/j.jphysparis.2006.10.001 CrossRefPubMedGoogle Scholar
- Friston K, Adams R, Montague R (2012a) What is value—accumulated reward or evidence? Front Neurorobotics 6:11. https://doi.org/10.3389/fnbot.2012.00011 CrossRefGoogle Scholar
- Friston K, Adams RA, Perrinet L, Breakspear M (2012b) Perceptions as hypotheses: saccades as experiments. Front Psychol 3:151. https://doi.org/10.3389/fpsyg.2012.00151 CrossRefPubMedPubMedCentralGoogle Scholar
- Friston K, Samothrakis S, Montague R (2012c) Active inference and agency: optimal control without cost functions. Biol Cybernet 106:523–541. https://doi.org/10.1007/s00422-012-0512-8 CrossRefGoogle Scholar
- Friston K, Schwartenbeck P, FitzGerald T, Moutoussis M, Behrens T, Dolan RJ (2014) The anatomy of choice: dopamine and decision-making. Philos Trans R Soc B Biol Sci 369:20130481. https://doi.org/10.1098/rstb.2013.0481 CrossRefGoogle Scholar
- Friston K, Rigoli F, Ognibene D, Mathys C, Fitzgerald T, Pezzulo G (2015) Active inference and epistemic value. Cognit Neurosci 6:187–214. https://doi.org/10.1080/17588928.2015.1020053 CrossRefGoogle Scholar
- Friston K, FitzGerald T, Rigoli F, Schwartenbeck P, O’Doherty J, Pezzulo G (2016) Active inference and learning. Neurosci Biobehav Rev 68:862–879. https://doi.org/10.1016/j.neubiorev.2016.06.022 CrossRefPubMedPubMedCentralGoogle Scholar
- Friston K, FitzGerald T, Rigoli F, Schwartenbeck P, Pezzulo G (2017a) Active inference: a process theory. Neural Comput 29:1–49. https://doi.org/10.1162/NECO_a_00912 CrossRefPubMedGoogle Scholar
- Friston KJ, Lin M, Frith CD, Pezzulo G, Hobson JA, Ondobaka S (2017b) Active inference, curiosity and insight. Neural Comput 29(10):2633–2683CrossRefGoogle Scholar
- Friston KJ, Parr T, de Vries B (2017c) The graphical brain: belief propagation and active inference. Netw Neurosci 1:381–414. https://doi.org/10.1162/NETN_a_00018 CrossRefPubMedPubMedCentralGoogle Scholar
- Friston KJ, Rosch R, Parr T, Price C, Bowman H (2017d) Deep temporal models and active inference. Neurosci Biobehav Rev 77:388–402. https://doi.org/10.1016/j.neubiorev.2017.04.009 CrossRefPubMedPubMedCentralGoogle Scholar
- Ghirardato P, Marinacci M (2002) Ambiguity made precise: a comparative foundation. J Econ Theory 102:251–289. https://doi.org/10.1006/jeth.2001.2815 CrossRefGoogle Scholar
- Gilhooly (2005) Working memory and planning, 1st edn. In: Morris R, Ward G (eds) The cognitive psychology of planning. Psychology Press, London, 256 p. https://doi.org/10.4324/9780203493564 CrossRefGoogle Scholar
- Gregory RL (1980) Perceptions as hypotheses. Philos Trans R Soc Lond B Biol Sci 290:181CrossRefGoogle Scholar
- Hikosaka O, Takikawa Y, Kawagoe R (2000) Role of the basal ganglia in the control of purposive saccadic eye movements. Physiol Rev 80:953CrossRefGoogle Scholar
- Hohwy J (2016) The self-evidencing brain. Noûs 50:259–285. https://doi.org/10.1111/nous.12062 CrossRefGoogle Scholar
- Huggins JH, Tenenbaum JB (2015) Risk and regret of hierarchical Bayesian learners. Paper presented at the Proceedings of the 32nd international conference on international conference on machine learning—Volume 37, Lille, FranceGoogle Scholar
- Hwa R (2004) Sample selection for statistical parsing. Comput Linguist 30:253–276CrossRefGoogle Scholar
- Kaplan R, Friston KJ (2018) Planning and navigation as active inference. Biol Cybernet. https://doi.org/10.1007/s00422-018-0753-2 CrossRefGoogle Scholar
- Kappen HJ, Gomez Y, Opper M (2012) Optimal control as a graphical model inference problem. Mach Learn 87:159–182CrossRefGoogle Scholar
- Krause A (2008) Optimizing sensing: theory and applications. Carnegie Mellon University, PittsburghGoogle Scholar
- Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. In: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval. Springer-Verlag New York, Inc., pp 3–12Google Scholar
- Lindley DV (1956) On a measure of the information provided by an experiment. Ann Math Stat 27:986–1005. https://doi.org/10.1214/aoms/1177728069 CrossRefGoogle Scholar
- Linsker R (1990) Perceptual neural organization: some approaches based on network models and information theory. Annu Rev Neurosci 13:257–281CrossRefGoogle Scholar
- Lloyd K, Leslie DS (2013) Context-dependent decision-making: a simple Bayesian model. J R Soc Interface 10:1. https://doi.org/10.1098/rsif.2013.0069 CrossRefGoogle Scholar
- MacKay DJC (1992) Information-based objective functions for active data selection. Neural Comput 4:590–604. https://doi.org/10.1162/neco.1992.4.4.590 CrossRefGoogle Scholar
- McKay RT, Dennett DC (2010) The evolution of misbelief. Behav Brain Sci 32:493–510. https://doi.org/10.1017/S0140525X09990975 CrossRefGoogle Scholar
- Mirza MB, Adams RA, Mathys CD, Friston KJ (2016) Scene construction, visual foraging, and active inference. Front Comput Neurosci. https://doi.org/10.3389/fncom.2016.00056 CrossRefPubMedPubMedCentralGoogle Scholar
- Moiseiwitsch BL (2013) Variational principles. Dover Publications, MineolaGoogle Scholar
- Moutoussis M, Trujillo-Barreto NJ, El-Deredy W, Dolan RJ, Friston KJ (2014) A formal model of interpersonal inference. Front Hum Neurosci 8:160. https://doi.org/10.3389/fnhum.2014.00160 CrossRefPubMedPubMedCentralGoogle Scholar
- Optican L, Richmond BJ (1987) Temporal encoding of two-dimensional patterns by single units in primate inferior cortex. II Information theoretic analysis. J Neurophysiol 57:132–146CrossRefGoogle Scholar
- Ortega PA, Braun DA (2010) A minimum relative entropy principle for learning and acting. J Artif Int Res 38:475–511Google Scholar
- Parr T, Friston KJ (2017a) The computational anatomy of visual neglect. Cereb Cortex. https://doi.org/10.1093/cercor/bhx316 CrossRefPubMedCentralGoogle Scholar
- Parr T, Friston KJ (2017b) Uncertainty, epistemics and active inference. J R Soc Interface 14:20170376CrossRefGoogle Scholar
- Parr T, Friston KJ (2017c) Working memory, attention, and salience in active inference. Sci Rep 7:14678. https://doi.org/10.1038/s41598-017-15249-0 CrossRefPubMedPubMedCentralGoogle Scholar
- Parr T, Friston KJ (2018) The discrete and continuous brain: from decisions to movement—and back again. Neural Comput 30:1–10CrossRefGoogle Scholar
- Parr T, Friston KJ (2019) The computational pharmacology of oculomotion. Psychopharmacology. https://doi.org/10.1007/s00213-019-05240-0 CrossRefPubMedPubMedCentralGoogle Scholar
- Parr T, Benrimoh D, Vincent P, Friston K (2018a) Precision and false perceptual inference. Front Integr Neurosci. https://doi.org/10.3389/fnint.2018.00039 CrossRefPubMedPubMedCentralGoogle Scholar
- Parr T, Rees G, Friston KJ (2018b) Computational neuropsychology and Bayesian inference. Front Hum Neurosci. https://doi.org/10.3389/fnhum.2018.00061 CrossRefPubMedPubMedCentralGoogle Scholar
- Parr T, Rikhye RV, Halassa MM, Friston KJ (2019) Prefrontal computation as active inference. Cereb Cortex. https://doi.org/10.1093/cercor/bhz118 CrossRefPubMedGoogle Scholar
- Pearl J (1998) Graphical models for probabilistic and causal reasoning. In: Smets P (ed) Quantified representation of uncertainty and imprecision. Springer Netherlands, Dordrecht, pp 367–389. https://doi.org/10.1007/978-94-017-1735-9_12 CrossRefGoogle Scholar
- Pearl J (2014) Probabilistic reasoning in intelligent systems: networks of plausible inference. Elsevier, AmsterdamGoogle Scholar
- Prosser A, Friston KJ, Bakker N, Parr T (2018) A Bayesian account of psychopathy: a model of lacks remorse and self-aggrandizing. Comput Psychiatry. https://doi.org/10.1162/cpsy_a_00016 CrossRefGoogle Scholar
- Rasmussen CE, Ghahramani Z (2001) Occam's razor, advances in neural information processing systems 13. In: Leen TK, Dietterich TG, Tresp V (eds) Proceedings from the conference, neural information processing systems. https://papers.nips.cc/book/advances-in-neural-information-processing-systems-13-2000
- Sales AC, Friston KJ, Jones MW, Pickering AE, Moran RJ (2018) Locus Coeruleus tracking of prediction errors optimises cognitive flexibility: an Active Inference model bioRxiv:340620Google Scholar
- Schacter DL, Benoit RG, De Brigard F, Szpunar KK (2015) Episodic future thinking and episodic counterfactual thinking: intersections between memory and decisions. Neurobiol Learn Mem 117:14–21. https://doi.org/10.1016/j.nlm.2013.12.008 CrossRefPubMedGoogle Scholar
- Schwartenbeck P, FitzGerald TH, Mathys C, Dolan R, Friston K (2015a) The dopaminergic midbrain encodes the expected certainty about desired outcomes. Cereb Cortex 25:3434–3445. https://doi.org/10.1093/cercor/bhu159 CrossRefPubMedGoogle Scholar
- Schwartenbeck P, FitzGerald TH, Mathys C, Dolan R, Kronbichler M, Friston K (2015b) Evidence for surprise minimization over value maximization in choice behavior. Sci Rep 5:16575. https://doi.org/10.1038/srep16575 CrossRefPubMedPubMedCentralGoogle Scholar
- Schwartenbeck P, FitzGerald TH, Mathys C, Dolan R, Wurst F, Kronbichler M, Friston K (2015c) Optimal inference with suboptimal models: addiction and active Bayesian inference. Med Hypotheses 84:109–117. https://doi.org/10.1016/j.mehy.2014.12.007 CrossRefPubMedPubMedCentralGoogle Scholar
- Settles B (2010) Active learning literature survey, vol 52. University of Wisconsin, Madison, p 11Google Scholar
- Sharot T (2011) The optimism bias. Curr Biol 21:R941–R945. https://doi.org/10.1016/j.cub.2011.10.030 CrossRefPubMedGoogle Scholar
- Sharot T, Guitart-Masip M, Korn Christoph W, Chowdhury R, Dolan Raymond J (2012) How dopamine enhances an optimism bias in humans. Curr Biol 22:1477–1481. https://doi.org/10.1016/j.cub.2012.05.053 CrossRefPubMedPubMedCentralGoogle Scholar
- Shewry MC, Wynn HP (1987) Maximum entropy sampling. J Appl Stat 14:165–170. https://doi.org/10.1080/02664768700000020 CrossRefGoogle Scholar
- Strens MJA (2000) A Bayesian framework for reinforcement learning. Paper presented at the proceedings of the seventeenth international conference on machine learningGoogle Scholar
- Sutton RS, Barto AG (1998) Reinforcement learning: an introduction, vol 1. MIT Press, CambridgeGoogle Scholar
- Todorov E (2008) General duality between optimal control and estimation. In: IEEE conference on decision and controlGoogle Scholar
- Verma D, Rao RP (2006) Planning and acting in uncertain environments using probabilistic inference. In: 2006 IEEE/RSJ international conference on intelligent robots and systems. IEEE, pp 2382–2387Google Scholar
- Vincent P, Parr T, Benrimoh D, Friston KJ (2019) With an eye on uncertainty: Modelling pupillary responses to environmental volatility. PLOS Comput Biol 15:e1007126. https://doi.org/10.1371/journal.pcbi.1007126 CrossRefPubMedPubMedCentralGoogle Scholar
- Wald A (1947) An essentially complete class of admissible decision functions. Ann Math Stat 4:549–555. https://doi.org/10.1214/aoms/1177730345 CrossRefGoogle Scholar
- Winn JM (2004) Variational message passing and its applications. CiteseerGoogle Scholar
- Winn J, Bishop CM (2005) Variational message passing. J Mach Learn Res 6:661–694Google Scholar
- Yang SC-H, Lengyel M, Wolpert DM (2016a) Active sensing in the categorization of visual patterns. eLife 5:e12215. https://doi.org/10.7554/elife.12215 CrossRefPubMedPubMedCentralGoogle Scholar
- Yang SC-H, Wolpert DM, Lengyel M (2016b) Theoretical perspectives on active sensing. Curr Opin Behav Sci 11:100–108. https://doi.org/10.1016/j.cobeha.2016.06.009 CrossRefGoogle Scholar
- Yedidia JS, Freeman WT, Weiss Y (2005) Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans Inf Theory 51:2282–2312CrossRefGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.