Computational Resource Demands of a Predictive Bayesian Brain
 1.7k Downloads
 1 Citations
Abstract
There is a growing body of evidence that the human brain may be organized according to principles of predictive processing. An important conjecture in neuroscience is that a brain organized in this way can effectively and efficiently approximate Bayesian inferences. Given that many forms of cognition seem to be well characterized as a form of Bayesian inference, this conjecture has great import for cognitive science. It suggests that predictive processing may provide a neurally plausible account of how forms of cognition that are modeled as Bayesian inference may be physically implemented in the brain. Yet, as we show in this paper, the jury is still out on whether or not the conjecture is really true. Specifically, we demonstrate that each key subcomputation invoked in predictive processing potentially hides a computationally intractable problem. We discuss the implications of these sobering results for the predictive processing account and propose a way to move forward.
Keywords
Theoretical cognitive neuroscience Predictive processing Bayesian modeling Computational complexity theory NPhard Intractability ApproximationIntroduction
The predictive processing account is becoming increasingly popular as an account of perceptual, behavioral, and neural phenomena in cognitive neuroscience. According to this account, the brain seeks to predict its sensory inputs using a hierarchy of probabilistic generative models (Clark 2013, 2016; Hohwy 2013). The account integrates ideas from different traditions, including (1) classical views on perception as an inferential process combining sensory input and prior knowledge (von Helmholtz 1867; Barlow 1961); (2) the view of the brain as encoding and processing uncertain information using Bayesian mechanisms (Dayan et al. 1995; Knill and Pouget 2004); (3) the view of the brain as a series of hierarchical generative (inverse) models (Friston 2008); and (4) the free energy principle in theoretical biology (Friston 2010). In addition, the account draws inspiration from computer science (e.g., the predictive coding approach in signal processing Vaseghi 2000) and philosophy (e.g., the work by Kant 1999/1787 on cognition and perception). While originally rooted in visual perception research (Lee and Mumford 2003; Rao and Ballard 1999; Hohwy et al. 2008), it has been successfully extended to combine action and perception in socalled active inference theories (Brown et al. 2011; Adams et al. 2013), in computational psychiatry (Edwards et al. 2012; Horga et al. 2014; Sterzer et al. 2018; Van de Cruys et al. 2014), to explain action understanding (Kilner et al. 2007a, b; Den Ouden et al. 2012), as well as explaining phenomena as dreaming (Hobson and Friston 2012), hallucinations (Brown and Friston 2012), conscious presence (Seth et al. 2011; Seth 2015), selfawareness (Seth and Tsakiris 2018), synesthesia (Rothen et al. 2018), and psychedelics (PinkHashkes et al. 2017).
… unconstrained Bayesian inference is not a viable solution for computation in the brain. (Knill and Pouget 2004, p. 758)
Both Clark (2016, p. 298) and Thornton (2016, p. 6) have argued that the predictive processing account demands approximate Bayesian inferences to be computationally tractable. However, the abovementioned intractability results imply that approximate inference may be necessary, but are definitely not sufficient, to ensure tractability of Bayesian computations (Kwisthout 2018; Donselaar 2018). That is, more is needed than an appeal to approximation as cure for the ailment of Bayesian intractability (Kwisthout et al. 2011).
It is thus a major virtue of the hierarchical predictive coding account that it effectively implements a computationally tractable version of the socalled Bayesian Brain Hypothesis. (Clark 2013, p. 191)
[The] predictive processing story, if correct, would rather directly underwrite the claim that the nervous system approximates, using tractable computational strategies, a genuine version of Bayesian inference. (Clark 2013, p. 189)
If Clark is right, then the hypothesis that brains are organized according to the principles of predictive processing is not only of great importance for neuroscience—as a hypothesis about the modus operandi of the human brain—but also for contemporary cognitive science. After all, it would directly suggest a candidate explanation of how the probabilistic computations postulated by the many Bayesian models in cognitive science (Chater et al. 2006; Griffiths et al. 2008, 2010) may be realistically implemented in the human brain. The promise that the predictive processing framework would yield a tractable way to perform (approximate) Bayesian inference seems all the more important in light of the fact that Bayesian models of cognition have been plagued by complaints about their apparent computational intractability (e.g., Gigerenzer 2008; Kwisthout et al. 2011). Having a candidate neural story on offer about how Bayesian inference may yet be tractable for human brains would, of course, further strengthen the already strong case for the Bayesian approach in cognitive science (see, e.g., Chater et al. 2006; Tenenbaum2011).
In this paper, we set out to analyze to what extent the framework already makes true on this promise, or otherwise could in the future. The approach that we take is as follows. We formally model each required subcomputation postulated by the predictive processing framework for the forward (bottom–up) and backward (top–down) chains of processing. Our models characterize these computations at Marr’s (1982) computational level, i.e., in terms of the basic input–output transformations that they are assumed to perform. This means that our analyses will be independent of the nature of the algorithmic or implementationallevel processes that the brain may use to perform these transformations (van Rooij 2008; van Rooij et al. 2019). We distinguish three key transformations: prediction, error computation, and explaining away. We will show that, unless the causal models underlying these subcomputations are somehow constrained, both making predictions and explaining away prediction errors are in and of itself intractable to compute (e.g., NPhard), whether exactly or approximately.^{1}
The remainder of this paper is organized as follows.^{2} First, we explain the basic ideas of the predictive processing framework in more detail. After providing the necessary preliminaries, definitions, and notational conventions in “Preliminaries and Notation”, we present our formal models of its postulated subprocesses in “Computational Modeling”. We then give an overview of the computational (in)tractability results for these formal models (“Results”), and discuss the implications of our findings for the presumed tractability of a predictive processing account of the Bayesian brain (“Discussion”).
The Predictive Processing Framework
By comparing predicted observations at each level n in this hierarchy with the actual observations at the same level n in this hierarchy, the system can determine the extent to which its predicted observations in the backward chain match the observations arising from the forward chain, and update its hypotheses about the world accordingly. An example of the explanatory uses of the predictive processing framework is its explanation of binocular rivalry in vision (Hohwy et al. 2008; Weilnhammer et al. 2017). When an image of a house is presented to the left eye, and an image of a face to the right eye, the subjective experience of the images alternates between a face and a house, rather than some combination of the separate stimuli. As we are not familiar with blended housefaces, such a combination would have a low prior probability, hence either a house or a face is predicted to be observed. The actual observation, however, triggers a prediction error: a mismatch between what was predicted (e.g., a face), and what was observed (both a house and a face). This mismatch then leads to an updated hypothesis; taking prior probabilities as well as the prediction error into account, the hypothesis will then shift toward a house, rather than a combination of a face and a house.
In predictive processing, three separate processes are assumed to operate on the generative models: viz., prediction (computing the observations that are predicted at level n, given the causal model at level n and the predictions at level n + 1 in the backward chain); error computation (computing the divergence between the predicted observations at level n in the backward chain and the actual observations at level n in the forward chain); and explaining away prediction errors. The latter process can be further distinguished into hypothesis updating (updating hypotheses at level n of the forward chain based on the prediction error between predicted and actual observations at level n); model revision (revising the parameters or structure of the generative model); active inference (actively intervening in the world, bringing its actual state closer to the predicted state); or adding observations (gathering information on the state of contextual cues or other hidden variables in the generative model).
Although several task or domainspecific computational models have been developed that apply the ideas underlying the predictive processing framework to the analysis of particular perceptual and neural processes (e.g., Grush 2004; Jehee and Ballard 2009; Rao and Ballard 1999), such specific models cannot directly be used to address our research question: “Is Bayesian inference tractable when implemented in a predictive processing architecture?” To address this question, we need instead generic computational models, i.e., models that are general enough to be applicable, in principle, to any cognitive domain. In particular, these models should allow for the representation of structured information (Griffiths et al. 2010). We propose such generic models in “Computational Modeling.” We now first provide for the necessary preliminaries.
Preliminaries and Notation
In this section, we introduce mathematical concepts and notation that we use throughout. To start, a causal Bayesian network \(\mathcal {B} = (\mathbf {G}_{\mathcal {B}}, \text {Pr}_{\mathcal {B}})\) is a graphical structure that models a set of stochastic variables, the conditional independences among these variables, and a joint probability distribution over these variables (Pearl 1988). \(\mathcal {B}\) includes a directed acyclic graph \(\mathbf {G}_{\mathcal {B}}=(\mathbf {V}, \mathbf {A})\), modeling the variables and conditional independences in the network, and a set of conditional probability tables (CPTs) \(\text {Pr}_{\mathcal {B}}\) capturing the stochastic dependences between the variables. The network models a joint probability distribution \(\text {Pr}({\mathbf {V}}) = {\prod }_{i=1}^{n} \text {Pr}({V_{i}}{\pi (V_{i})})\) over its variables, where π(V_{i}) denotes the parents of V_{i} in \(\mathbf {G}_{\mathcal {B}}\) (Kiiveri et al. 1984). The arcs have a causal interpretation; the semantics of which is given by the docalculus (Pearl 2000). By convention, we use uppercase letters to denote individual nodes in the network, uppercase bold letters to denote sets of nodes, lowercase letters to denote value assignments to nodes, and lowercase bold letters to denote joint value assignments to sets of nodes. We use the notation Ω(V_{i}) to denote the set of values that V_{i} can take. Likewise, Ω(V) denotes the set of joint value assignments to the set of variables V. For brevity and readability, we often omit the set of variables over which a probability distribution is defined if it is clear from the context.
Predictive processing is often construed as using a hierarchy of Bayesian generative models, where posterior probabilities on one level of the hierarchy provide priors for its subordinate level (e.g., Kilner et al. 2007b; Lee and Mumford 2003; Rao and Ballard 1999). Consistent with our earlier work (Kwisthout et al. 2017), we propose a general computational model where each level of the hierarchy can be seen (for the sake of computational analysis) as a separate causal Bayesian network \(\mathcal {B}_{L}\), where the variables are partitioned into a set of hypothesis variables Hyp, a set of prediction variables Pred, and a set of intermediate variables Int, describing contextual dependences and (possibly complicated) structural dependences between hypotheses and predictions. We assume that all variables in Hyp are source variables, all variables in Pred are sink variables, and that the Pred variables in \(\mathcal {B}_{L}\) are identified with the Hyp variables in \(\mathcal {B}_{L+1}\) for all levels of the hierarchy save the lowest one. We distinguish between the prior probability of a set of variables (such as Hyp) in the network, which we assume is stable over time, and the current inferred distribution over that set. For example, given that we identify the Pred variables in \(\mathcal {B}_{L}\) with the Hyp variables in \(\mathcal {B}_{L+1}\), any inferred probability distribution for Pred in \(\mathcal {B}_{L}\) is “copied” to Hyp variables in \(\mathcal {B}_{L+1}\), independent of the prior distribution over Hyp. Technically, we establish this by means of socalled virtual evidence (Bilmes 2004).
The Hamming distance d_{H}(p, p^{′}) is a measure of how two joint value assignments p and p^{′} to a set of variables P diverge from each other; it simply adds up the number of values in the joint value assignment where the two joint value assignments disagree (Hamming 1950). It can be seen as the size of the prediction error when predictions and observations are defined as joint value assignments, rather than distributions. For ease of exposition, in the remainder, we abbreviate d_{H}(p_{Obs}, p_{Pred}) to d_{H} when the distance between predicted and observed joint value assignments to Pred is computed.
Again, also in this formalization, the size of the prediction error is not to be confused with the prediction error itself. We define the prediction error d(p, p^{′}) as a difference function such that for all paired elements (p, p^{′}) of the joint value assignments p and p^{′} to the prediction variables Pred, d(p, p^{′}) = nil if p = p^{′} and d(p, p^{′}) = p if p≠p^{′}; here, nil is a special blank symbol. Note that the prediction error d_{H} = 0 if and only if the prediction error is vacuous.
Computational Modeling
The hierarchy of coupled subcomputations that we sketched above and in Fig. 1 can be seen as a scheme for candidate algorithmic level explanations of how perceptual and cognitive inferences—often wellcaptured by Bayesian models situated at the computational level of Marr (1982) and presumably living higher up in such hierarchies—could be computed by human brains. For such an algorithm to be tractable, all of its subcomputations need to be tractable. Here, we present models of the three subcomputations prediction, error computation, and explaining away also situated at Marr’s computational level. That is, the models merely characterize the nature of the transformation the subcomputations achieve while not committing to any further hypotheses of how these subcomputations are again further subdivided in subsubcomputations.
Before presenting our models, we note that it is difficult to settle on a single candidate computationallevel model per subcomputation, because several different—sometimes mutually inconsistent—interpretations of the predictive processing framework can be found in the literature (e.g., Friston 2002, 2005; Hohwy et al. 2008; Kilner et al. 2007b; Lee and Mumford 2003; Knill and Pouget 2004). The most prominent aspects in which interpretations diverge are in the conceptualization of predictions and of hypothesis revision. Some authors interpret the inference steps as updating one’s probability distribution over candidate hypotheses (Lee and Mumford 2003, p. 1437), and others interpret them as fixating one’s belief to the most probable ones (Kilner et al. 2007b, p. 161). These two notions correspond to two different computational problems in Bayesian networks, viz., the problem of computing a posterior probability distribution and the problem of computing the mode of a posterior probability distribution (i.e., the joint value assignment with the highest posterior probability). Following conventions in the machine learning literature, we apply the suffix SUM to the distribution updating variants, and MAX to the belief fixation model variants.
When explaining away prediction errors by hypothesis updating (that is, adapting our current best explanation of the causes of the observations we make, contrasted to, e.g., active inference), there are again two possible conceptualizations in both the SUM and MAX variants. In the belief updating conceptualization, the current hypothesis distribution (SUM) or current belief (MAX) is updated by computing the posterior hypothesis distribution given the observed distribution (SUM) or the hypothesis that has the highest posterior probability given the evidence (MAX); in either situation, both the likelihood and the prior probability of the hypotheses are taken into account (Friston 2002, p. 13). In contrast, in the belief revision conceptualization, the current hypothesis distribution (SUM) or current belief (MAX) is revised such that the prediction error is minimized, that is, replacing the current hypothesis distribution (SUM) or current belief (MAX) by one that has a higher likelihood given the observations (Kilner et al. 2007b, p. 161). Note that the crucial distinction between both conceptualizations is that, in the belief update conceptualization, the priors of the hypotheses are taken into consideration while they are ignored in the belief revision conceptualization. The rationale behind this is that in belief updating the aim is to globally reduce prediction error over several levels of the hierarchy, while in belief revision the aim is to minimize local prediction error within one level of the hierarchy.
Summaries of all problem variants in the computational model; see the main text for full formal definitions for these problems
Problem  Input  Output 

Prediction (SUM)  Probability distribution Pr(Hyp)  \(\text {Pr}({\text {Pred}}) = {\sum }_{\mathbf {h}}\text {Pr}({\text {Pred}}{\text {Hyp} = \mathbf {h}})\) ×Pr(Hyp = h) 
Prediction (MAX)  Joint value assignment Hyp = h  \(\arg \max _{\mathbf {p}}\text {Pr}({\text {Pred}=\mathbf {p}}{\text {Hyp}=\mathbf {h}})\) 
ErrorComputation (SUM)  Pr_{(Obs)}(Pred), Pr_{(Pred)}(Pred)  The prediction error δ_{(Obs,Pred)} 
ErrorComputation (MAX)  Joint value assignments Pred = p, \(\text {Pred}=\mathbf {p}^{\prime }\)  The prediction error \(d(\mathbf {p}^{\prime },\mathbf {p})\) 
ErrorSizeComputation (SUM)  Pr_{(Obs)}(Pred), Pr_{(Pred)}(Pred)  The size of the prediction error D_{KL} 
ErrorSizeComputation (MAX)  Joint value assignments Pred = p, \(\text {Pred}=\mathbf {p}^{\prime }\)  The size of the prediction error d_{H} 
BeliefUpdating (SUM)  Prediction error δ_{(Obs,Pred)}  Posterior distribution \(\text {Pr}^{\prime }(\text {Hyp}) = \) 
\({\sum }_{\mathbf {p}}\text {Pr}({\text {Hyp}}{\text {Pred} = \mathbf {p}})\times \text {Pr}_{\text {Obs}}\)  
(Pred = p)  
BeliefUpdating (MAX)  Prediction error \(d(\mathbf {p}^{\prime },\mathbf {p})\)  The most probable joint value assignment \(\text {argmax}_{\mathbf {h}^{\prime }}\text {Pr}({\text {Hyp}=\mathbf {h}^{\prime }}{\text {Pred}=\mathbf {p}^{\prime }})\) 
BeliefRevision (SUM)  Prediction error δ_{(Obs,Pred)}  Revised probability distribution \(\text {Pr}^{\prime }(\text {Hyp})\) 
such that \(D_{\text {KL}}{[\text {Hyp}^{\prime }]}\) is minimal  
BeliefRevision (MAX)  Prediction error \(d(\mathbf {p}^{\prime },\mathbf {p})\)  Revised joint value assignment \(\text {Hyp}=\mathbf {h^{\prime }}\) such that \(d_{\mathrm {H}{[\mathbf {h}^{\prime }}]}\) is minimal 
ModelRevision (SUM)  Prediction error δ_{(Obs,Pred)}, set P  A combination of values 𝜃 to P such that 
of parameter probabilities  D_{KL}[𝜃] is minimal  
ModelRevision (MAX)  Prediction error \(d(\mathbf {p}^{\prime },\mathbf {p})\), set P of parameter probabilities  A combination of values 𝜃 to P such that d_{H}[𝜃] is minimal 
AddObservations (SUM)  Prediction error δ_{(Obs,Pred)}, designated and yet unobserved subset \(\mathbf {O}\subseteq \text {Int}\)  An observation O = o such that D_{KL}[o] is minimal 
AddObservations (MAX)  Prediction error \(d(\mathbf {p}^{\prime },\mathbf {p})\), designated and yet unobserved subset \(\mathbf {O}\subseteq \text {Int}\)  An observation O = o such that d_{H}[o] is minimal 
Intervention (SUM)  Prediction error δ_{(Obs,Pred)}, designated subset \(\mathbf {A}\subseteq \text {Int}\)  An intervention do(A = a) such that D_{KL}[a] is minimal 
Intervention (MAX)  Prediction error \(d(\mathbf {p}^{\prime },\mathbf {p})\), designated subset \(\mathbf {A}\subseteq \text {Int}\)  An intervention do(A = a) such that d_{H}[a] is minimal 
SUM Variants
Prediction (SUM)
Instance: A causal Bayesian network \(\mathcal {B}_{L}\) with designated variable subsets Pred and Hyp, probability distribution Pr(Hyp) over Hyp.
Output:\(\text {Pr}({\text {Pred}}) = {\sum }_{\mathbf {h}}\text {Pr}({\text {Pred}}{\text {Hyp} = \mathbf {h}})\times \text {Pr}({\text {Hyp} = \mathbf {h}})\), i.e., the updated marginal probability distribution over the prediction nodes Pred.
ErrorComputation (SUM)
Instance: A set of variables Pred and two probability distributions Pr_{(Obs)} and Pr_{(Pred)} over Pred.
Output: The prediction error δ_{(Obs, Pred)}.
ErrorSizeComputation (SUM)
Instance: As in ErrorComputation (SUM).
Output: The size of the prediction error D_{KL}.
BeliefUpdating (SUM)
Instance: A causal Bayesian network \(\mathcal {B}_{L}\) with designated variable subsets Hyp and Pred, a probability distribution Pr_{(Pred)} over Pred, and a prediction error δ_{(Obs,Pred)}.
Output: The posterior probability distribution \({\sum }_{\mathbf {p}}\text {Pr}({\text {Hyp}}{\text {Pred} = \mathbf {p}})\times \text {Pr}_{\text {Obs}}(\text {Pred} = \mathbf {p})\).
BeliefRevision (SUM)
Instance: As in BeliefUpdating.
Output: A (revised) prior probability distribution \(\text {Pr}_{(\text {Hyp})^{\prime }}\) over Hyp such that \(D_{\text {KL}}{[\text {Hyp}^{\prime }]}\) is minimal.
ModelRevision (SUM)
Instance: As in BeliefUpdating; furthermore, a set \(\mathbf {P}\subset \text {Pr}_{\mathcal {B}_{L}}\) of parameter probabilities.
Output: A combination of values 𝜃 to P such that D_{KL}[𝜃] is minimal.
AddObservations (SUM)
Instance: As in BeliefUpdating; furthermore, a designated and yet unobserved subset O ⊆Int.
Output: An observation o for the variables in O such that D_{KL}[o] is minimal.
Intervention (SUM)
Instance: As in BeliefUpdating; furthermore, a designated subset A ⊆Int.
Output: An intervention a to the variables in A such that D_{KL}[a] is minimal.
MAX Variants
Prediction (MAX)
Instance: A causal Bayesian network \(\mathcal {B}_{L}\) with designated variable subsets Pred and Hyp, a joint value assignment h to Hyp.
Output: argmax_{p}Pr(Pred = pHyp = h), i.e., the most probable joint value assignment p to Pred, given Hyp = h.
ErrorComputation (MAX)
Instance: A set of variables Pred and two joint value assignments p and p^{′} to Pred.
Output: The prediction error d(p^{′}, p).
ErrorSizeComputation (MAX)
Instance: As in ErrorComputation (MAX).
Output: The size of the prediction error d_{H.}
BeliefUpdating (MAX)
Instance: A causal Bayesian network \(\mathcal {B}_{L}\) with designated variable subsets Hyp and Pred, a joint value assignment p to Pred, and the prediction error d(p^{′}, p) such that p^{′} = p + d(p^{′}, p).
Output:\(\text {argmax}_{\mathbf {h}^{\prime }}\text {Pr}({\text {Hyp}=\mathbf {h}^{\prime }}{\text {Pred}=\mathbf {p}^{\prime }})\), i.e., the most probable joint value assignment h^{′} to Hyp, given Pred = p^{′}
BeliefRevision (MAX)
Instance: As in BeliefUpdating.
Output: A (revised) joint value assignment h^{′} to Hyp such that d_{H}[h^{′}] is minimal.
ModelRevision (MAX)
Instance: As in BeliefUpdating; furthermore, a set \(\mathbf {P}\subset \text {Pr}_{\mathcal {B}_{L}}\) of parameter probabilities.
Output: A combination of values p to P such that d_{H}[p] is minimal.
AddObservations (MAX)
Instance: As in BeliefUpdating; furthermore, a designated and yet unobserved subset O ⊆Int.
Output: An observation o for the variables in O such that d_{H}[o] is minimal.
Intervention (MAX)
Instance: As in BeliefUpdating; furthermore, a designated subset A ⊆Int.
Output: An intervention a to the variables in A such that d_{H}[a] is minimal.
Results
Intractability Results
Result 1 Prediction is N P hard, both in the SUM and MAX variants, even if all variables are binary, there is only a singleton hypothesis variable, and there is only a singleton prediction variable.
Result 2 ErrorComputation and ErrorSizeComputation can be computed in polynomial time, both in the SUM and MAX variants.
Result 3BeliefUpdatingandBeliefRevisionareNPhard, both in the SUM and MAX variants, even if all variables are binary, there is only a singleton hypothesis variable, there is only a singleton prediction variable, and (forBeliefRevision) the size of the prediction error is arbitrarily small.
Result 4 ModelRevision is N P hard, both in the SUM and MAX variants, even if all variables are binary, there is only a singleton hypothesis variable, there is only a singleton prediction variable, and there is just a single parameter that can be revised.
Result 5 AddObservations and Intervention are N P hard, both in the SUM and MAX variants, even if all variables are binary, there is only a singleton hypothesis variable, there is only a singleton prediction variable, and there is just a single variable that can be observed, respectively, intervened on.
Some important observations and implications can be drawn from our results: Without constraints on the network, Prediction, HypothesisUpdating, Model Revision, Intervention, and AddObservations are all intractable (NPhard) for either the SUM or the MAX interpretation, and for either the “prediction error minimization” and “best explanation” interpretation of HypothesisUpdating. These findings make clear that a predictive processing implementation of Bayesian inference does not yet make the latter tractable; i.e., the subcomputations can themselves also be intractable. This intractability holds even under stringent additional assumptions, such as that each level in the hierarchy contains a causal model with at most one binary hypothesis variable and at most one binary observation variable. Finally, the size of the initial prediction error does not contribute to the hardness of BeliefRevision, as the intractability result holds even if the prediction error is arbitrarily close to zero. It is likely that this result can also be extended to ModelRevision, AddObservations, and Intervention, but we did not yet succeed to prove this.
Tractability Results

The maximum number of values each variable in the Bayesian network can take (c);

The treewidth of the network (t), a graphtheoretical property that can loosely be described as a measure on the “localness” of connections in the network (Bodlaender 1993);

The size of the prediction space (Pred) or hypothesis space (Hyp);

The number of probability parameters that may be revised (P);

The size of the observation space (O) or action repertoire (A); and

The probability of the most probable prediction or most probable hypothesis (1 − p).
Our main findings are that both the subcomputations Prediction and HypothesisUpdating can be performed tractably when the topological structure of the Bayesian network is constrained (small t), and each variable can take a small number of distinct values (small c), and the search space of possible predictions and hypotheses is small (small Pred and small Hyp). Specifically, both SUM and MAX variants are computable in fptractable time O(c^{Pred}⋅ c^{t} ⋅ n) for Prediction, and O(c^{Hyp}⋅ c^{t} ⋅ n) for HypothesisUpdating. In addition, the MAX variants are also tractable when the prediction or hypothesis space may be large, but instead the most probable prediction (hypothesis) has a high probability (i.e., 1 − p is small). Specifically, Prediction and HypothesisUpdating are then computable in fptractable time \(O\left (c^{\frac {\log (p)}{log(1p)}}\cdot c^{t} \cdot n\right )\). For ModelRevision, Intervention, and AddObservations similar tractability results hold, but in addition to these parameters also the the number of revisable problem parameters, action repertoire, or observation space needs to be constrained.
Theoretical Predictions for Neuroscience
By combining the intractability and (fixedparameter) tractability results from the previous two sections, we can derive a few theoretical predictions. First off, the intractability results in “Intractability Results” establish that a brain can tractably implement predictive processing’s (Bayesian) subcomputations only if each level of the hierarchy encodes a Bayesian network that is properly constrained.^{4} Second, the tractability results in “Tractability Results” present constraints that are sufficient for tractability. Specifically, each tractability result identifies a set of parameters {k_{1}, k_{2},…, k_{m}} for which a given subcomputation (Prediction, HypothesisUpdating etc.) is efficiently computable for relatively large Bayesian networks provided only that the values of k_{1}, k_{2},…, k_{m} are relatively small. This means that the predictive processing account can be empirically shown to meet the conditions for tractability if it can be shown that the brain’s implementation respects these constraints.^{5} As explained in van Rooij (2008) and van Rooij et al. (2019), fixedparameter tractability results are necessarily qualitative in nature, because exact numerical predictions are not possible at this level of abstraction. But as a rule of thumb, one can consider parameters in the range of, say, 2 to 10 relatively small, and in the range of, say, 100 to 10,000 relatively large.^{6}
Empirically testing if the brain’s predictive processing hierarchy respects these (qualitative) constraints is beyond the scope of this paper. Moreover, it is beyond stateofthe art knowledge about the relationship between the highlevel (computationallevel) parameters that we analyzed in this paper and putative parameters measurable with empirical neuroscience methods. In order to make the relationship transparent, more theoretical neuroscience research may need to be done to explore this relationship more directly and in a way designed to map out the relevant bridging hypotheses (i.e., about how the parameters at the computationallevel map to parameters of the neural implementation). For instance, at present, it is known that the types of Bayesian computations that we analyzed at the computational level can, in principle, be (approximately) computed by spiking artificial neural networks (Buesing et al. 2011; Habenschuss et al. 2013; Maass 2014). Also, it is known that the number of spiking neurons needed to approximate Bayesian computations is directly related to computationallevel parameters, such as t and c considered in “Tractability Results” (i.e., the number of neurons is proportional to c^{t}, Pecevski et al.2011). However, since such implementations at present still make biologically implausible assumptions about how the implementation is achieved,^{7} it is too early to know how the size of the higher level parameters will correspond to the size (or other) of the putative parameters of biological neural networks in the brain. We hope that our computationallevel analyses provide an incentive to prioritize theoretical and empirical neuroscience investigations into how computationallevel parameters relate to parameters of the biological implementation. Because one thing is certain: our complexitytheoretic proofs show that without proper parametric constraints at the biological level it is impossible for a predictive brain to engage in Bayesian subcomputations tractably.
A Note on Approximation
As we argued elsewhere (Kwisthout et al. 2011; Kwisthout and van Rooij 2013a; Kwisthout 2015), an approximation assumption itself cannot buy tractability. In particular, the sampling approximation algorithms that are typically assumed to be representative of the brain’s approximate computations (Maass 2014; Habenschuss et al. 2013) are provably intractable in general.^{8} Heuristical approaches, such as Laplace or mean field approximations, have no guarantees on the quality of the approximation; while in structured mean field approximations, settling the tradeoff between factorization and coupling of variables might in itself be an intractable problem.^{9} However, recent advances in the study of the computational complexity of such approximate algorithms (Kwisthout 2018; Donselaar 2018) suggest that, while still intractable in general, these algorithms may be rendered tractable for certain subclasses of inputs. Whether these tractable subclasses include the sorts of approximation problems the brain is faced with is still an open problem.
Model Revision Revisited
In the context of this paper, we make a distinction between learning and revising generative models. We interpret learning as the process of gradually shaping generative models, including the hyperparameters that describe their precision, by Bayesian updating. An illustrative example could be the development of a generative model that describes the outcome of a coin toss, i.e., the probability distribution over the outcomes and the precision of that model, based on several experiences with coin tosses. In contrast, we interpret model revision as the process of accommodating a dynamic, changing world in the light of unexpected prediction errors; that is, the adaptation of generative models with the aim of lowering prediction error. In this paper, we operationalize model revision as parameter tuning, as a sort of “baseline” of the most straightforward and established way of revising generative models. However, in reality, this is an oversimplification: One can also revise models by adding new variables or adding values of variables such as to accomodate for contextual influences. For example, one’s generative model that describes the expected action “handshake” as predicted by the assumed intention “friendly greeting” can be updated by including a contextual variable “age” (young/old) and adding the action “fistbump” to the repertoire. In addition, we might need to generate new hypotheses and integrate them to our generative models; a process known as abduction proper (c.f. Blokpoel et al.2018). These aspects of model revision are yet to be studied in the context of the predictive processing account.
Discussion
The hypothesis that a predictive processing implementation of a Bayesian brain may harbor some intractable subcomputations, especially when the account is assumed to scale to all of cognition (Kwisthout et al. 2017; Otworowska et al. 2014), has been put forth before (Blokpoel et al. 2012). In this paper, we support and have extended this hypothesis by assessing the computational resource demands of each key subcomputation required for a predictive Bayesian brain. To this end, we formulated explicit computationallevel characterizations of the subcomputations, while accommodating for different possible interpretations in the literature.
Complexity analyses of the subcomputations reveal that most are intractable (in the sense of NPhard; see Result 1 and Results 3, 4, 5), unless the causal models that are coded by the hierarchy are somehow constrained.
These complexitytheoretic results are important for researchers in the field to know about, given that part of the popularity of the predictive processing account seems due to its claims of computational tractability: For instance, the account postulates that, by processing only the prediction error rather than the whole input signal, tractable inferences can be made provided that the prediction error is small. However, our results show that these claims are not substantiated for generative models based on the structured representations (such as causal Bayesian networks) that are required for higher cognitive capacities (Kwisthout et al. 2017; Otworowska et al. 2014; Tenenbaum 2011).
We realize that our mathematical proofs and results may seem counterintuitive given that it is often tacitly assumed or explicitly claimed that the computations postulated by predictive processing can be tractable approximated by algorithms implemented by the brain (e.g., Bruineberg et al. 2018; Clark 2013; Swanson 2016). The apparent mismatch between such claims and our findings can be understood as follows (see also “A Note on Approximation”): the claims of tractable approximation in the literature are in one way or another based on simplifying representational assumptions. These can be, for instance, the assumption of Gaussian probability distributions as in Laplace approximation (Friston et al. 2007) or the assumption of independence of variables (despite their potential dependence in reality) as in variational methods (Bruineberg et al. 2018). These assumptions specifically prohibit richly structured representations that have been argued to be essential as models of cognition and are therefore pursued by Bayesian modelers in cognitive science (Chater et al. 2006; Griffiths et al. 2008, 2010; Tenenbaum 2011); see also Kwisthout et al. (2017).
Holding onto overly simplified representational assumptions may buy one tractability, but it would not support the predictive processing framework to scale to higher cognition. This route would severely limit the framework’s relevance for cognitive neuroscience, a route we suspect few predictive processing theorists would prefer. Moreover, the complexitytheoretic results presented in this paper establish that the popular “approximation” methods cannot genuinely approximate predictive processing computations over structured representations, as their outputs will be arbitrarily off from the outputs corresponding to the predictive processing subcomputations defined over Bayesian networks for most inputs.
To forestall misunderstanding, we would like to clarify that we do not wish to suggest that the brain cannot tractably implement the Bayesian inferences required for human cognition, nor do we suggest that the brain cannot do so using a predictive processing architecture. In fact, we think that the hypothesis that the brain implements tractable Bayesian inferences, possibly using predictive processing, is not implausible. What our results show, however, is that if the brain indeed tractably implements Bayesian inference, then that is not because the inferences are implemented using predictive processing per se. Furthermore, our results suggest that the key to tractable inference in the brain may rather lie in the properties of the causal models that it represents at the different levels of such hierarchy. Our complexity results thus confirm earlier theoretical work that shows that while theoretical models at the neuronal level (e.g., spiking neural networks) can be shown to be able to approximate posteriors in arbitrary Bayesian networks, the number of nodes grows exponentially with the order of stochastic relations (and thus also with the treewidth of the network) in the probability distribution (Pecevski et al. 2011).
With our fixedparameter tractability analyses (see “Tractability Results”), we illustrate how one may go about identifying the properties that make Bayesian inference tractable (see also Blokpoel et al. 2012, 2013; Kwisthout 2011; Kwisthout et al. 2011). Some of these properties are topological in nature (i.e., the structure of the causal models), whereas others pertain to the number of competing hypotheses that the brain considers per level of the hierarchy. In light of our findings, we propose that an important topic of empirical and theoretical investigations in cognitive neuroscience should be to investigate whether or not the brain’s causal models, and their biological implementations (“Theoretical Predictions for Neuroscience”), have the properties that are necessary for tractable predictive processing. It seems that this is the only way in which the tractability claims about predictive processing can be more than castles in the air.
Footnotes
 1.
Relative to common and widely endorsed assumptions in theoretical computer science; in particular, the assumptions that P≠NP (Goldreich 2008) and that BPP≠NP (Clementi et al. 1998). Here, BPP (short for boundederror probabilistic polynomialtime) is the class of problems that enjoy efficient randomized algorithms.
 2.
More technical preliminaries from complexity theory and full details of the formal intractability proofs can be found in the Supplementary Materials.
 3.
Again, assuming P≠NP.
 4.
This is a necessary condition for tractability, even when approximate computation is assumed (see “A Note on Approximation”).
 5.
 6.
To illustrate, assume that c = 2, Pred = 8 and t = 5. Then, say, O(c^{Pred}⋅ c^{t} ⋅ n) would be upperbounded by a polynomial O(n^{2}) for n ≥ 8,200.
 7.
Including, for example, handcrafted deterministic dependences between variables in order to represent higher order stochastic interactions.
 8.
Under the common complexitytheoretical assumption that BPP≠NP (Clementi et al. 1998)
 9.
Nils Donselaar, personal communications.
Notes
Acknowledgements
We are grateful to Mark Blokpoel, Lieke Heil, Maria Otworowska, and Nils Donselaar for helpful discussions and comments on earlier versions of this manuscript, and to the anonymous reviewers for very constructive feedback. This paper builds on preliminary results presented at CogSci 2013 (Kwisthout and van Rooij 2013b) and in a conference paper (Kwisthout 2014). This research was funded by a NWOTOP grant (40711040) awarded to IvR; NWO had no involvement in study design, collection, analysis and interpretation of data, in the writing of the report, or in the decision to submit the article for publication.
Supplementary material
References
 Abdelbar, A.M., & Hedetniemi, S.M. (1998). Approximating MAPs for belief networks is NPhard and other theorems. Artificial Intelligence, 102, 21–38.CrossRefGoogle Scholar
 Adams, R., Shipp, S., Friston, K. (2013). Predictions not commands: active inference in the motor system. Brain Structure and Function, 218(3), 611–643.CrossRefPubMedGoogle Scholar
 Arora, S., & Barak, B. (2009). Complexity theory: a modern approach. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
 Barlow, H.B. (1961). Possible principles underlying the transformation of sensory messages. In W.A. Rosenblith (Ed.) Sensory Communication, (Vol. 3 pp. 217–234). Cambridge,MA: MIT Press.Google Scholar
 Bilmes, J. (2004). On virtual evidence and soft evidence in Bayesian networks. Tech. Rep UWEETR20040016, University of Washington, Department of Electrical Engineering.Google Scholar
 Blokpoel, M., Kwisthout, J., van Rooij, I. (2012). When can predictive brains be truly Bayesian? Frontiers in Theoretical and Philosophical Psychology, 3, 406.Google Scholar
 Blokpoel, M., Kwisthout, J., van der Weide, T., Wareham, T., van Rooij, I. (2013). A computationallevel explanation of the speed of goal inference. Journal of Mathematical Psychology, 57(34), 117–133.CrossRefGoogle Scholar
 Blokpoel, M., Wareham, H., Haselager, W., Toni, I., van Rooij, I. (2018). Deep analogical inference as the origin of hypotheses. Journal of Problem Solving, 11(1), 3.Google Scholar
 Bodlaender, H.L. (1993). A tourist guide through treewidth. Acta Cybernetica, 11, 1–21.Google Scholar
 Bossaerts, P., & Murawski, C. (2017). Computational complexity and human decisionmaking. Trends in Cognitive Sciences, 21(12), 917–929.CrossRefPubMedGoogle Scholar
 Brown, H., & Friston, K. (2012). Freeenergy and illusions: the cornsweet effect. Frontiers in Psychology, 3, 43.PubMedPubMedCentralGoogle Scholar
 Brown, H., Friston, K., Bestmann, S. (2011). Active inference, attention, and motor preparation. Frontiers in Psychology, 2(218), 1–9.Google Scholar
 Bruineberg, J., Kiverstein, J., Rietveld, E. (2018). The anticipating brain is not a scientist: the freeenergy principle from an ecologicalenactive perspective. Synthese, 195(6), 2417–2444.CrossRefPubMedGoogle Scholar
 Buesing, L., Bill, J., Nessler, B., Maass, W. (2011). Neural dynamics as sampling: A model for stochastic computation in recurrent networks of spiking neurons. PLoS Computational Biology, 7(11), e1002, 211.CrossRefGoogle Scholar
 Castillo, E., Gutiérrez, J., Hadi, A. (1997). Sensitivity analysis in discrete Bayesian networks. IEEE Transactions on Systems Man, and Cybernetics, 27, 412–423.CrossRefGoogle Scholar
 Chater, N., Tenenbaum, J., Yuille, A. (2006). Probabilistic models of cognition: conceptual foundations. Trends in Cognitive Sciences, 107, 287–201.CrossRefGoogle Scholar
 Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181–204.CrossRefPubMedGoogle Scholar
 Clark, A. (2016). Surfing uncertainty: prediction action, and the embodied mind. Oxford: Oxford University Press.CrossRefGoogle Scholar
 Clementi, A., Rolim, J., Trevisan, L. (1998). Recent advances towards proving P=BPP. In E. Allender, A. Clementi, J. Rolim, L. Trevisan (Eds.) EATCS (p. 64).Google Scholar
 Cooper, G.F. (1990). The computational complexity of probabilistic inference using Bayesian belief networks. Artificial Intelligence, 42(2), 393–405.CrossRefGoogle Scholar
 Dagum, P., & Luby, M. (1993). Approximating probabilistic inference in Bayesian belief networks is NPhard. Artificial Intelligence, 60(1), 141–153.CrossRefGoogle Scholar
 Darwiche, A. (2009). Modeling and reasoning with Bayesian networks. Cambridge: CU Press.CrossRefGoogle Scholar
 Dayan, P., Hinton, G.E., Neal, R.M. (1995). The helmholtz machine. Neural Computation, 7, 889–904.CrossRefPubMedGoogle Scholar
 Den Ouden, H., Kok, P., De Lange, F. (2012). How prediction errors shape perception, attention, and motivation. Frontiers in Psychology, 3, e548.CrossRefGoogle Scholar
 Donselaar, N. (2018). Parameterized hardness of active inference. In Proceedings of the international conference on probabilistic graphical models, PMLR, (Vol. 72 pp. 109–120).Google Scholar
 Edwards, M., Adams, R., Brown, H., Pare’/es, I., Friston, K. (2012). A bayesian account of ‘hysteria’. Brain, 135(11), 3495–512.CrossRefPubMedPubMedCentralGoogle Scholar
 Friston, K. (2002). Functional integration and inference in the brain. Progress in Neurobiology, 590, 1–31.Google Scholar
 Friston, K. (2005). A theory of cortical responses. Philosophical Transactions of the Royal Society B, 350, 815–836.CrossRefGoogle Scholar
 Friston, K. (2008). Hierarchical models in the brain. PLoS Computational Biology, 4(11), e1000,211.CrossRefGoogle Scholar
 Friston, K. (2010). The freeenergy principle: a unified brain theory? Nature Reviews Neuroscience, 11(2), 127–138.CrossRefGoogle Scholar
 Friston, K., Mattout, J., TrujilloBarreto, N., Ashburner, J., Penny, W. (2007). Variational free energy and the Laplace approximation. Neuroimage, 34, 220–234.CrossRefPubMedGoogle Scholar
 Friston, K., Adams, R., Perrinet, L., Breakspear, M. (2012). Perceptions as hypotheses: Saccades as experiments. Frontiers in Psychology, 3, e151.Google Scholar
 Frixione, M. (2001). Tractable competence. Minds and Machines, 11, 379–397.CrossRefGoogle Scholar
 Garey, M., & Johnson, D. (1979). Computers and intractability. A guide to the theory of NPcompleteness. W.H Freeman and Co., San Francisco, CA.Google Scholar
 Gigerenzer, G. (2008). Why heuristics work. Perspectives in Psychological Science, 3(1), 20–29.CrossRefGoogle Scholar
 Gill, J.T. (1977). Computational complexity of probabilistic Turing Machines. SIAM Journal of Computing 6(4), 675–695.Google Scholar
 Goldreich, O. (2008). Computational complexity: a conceptual perspective. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
 Griffiths, T., Kemp, C., Tenenbaum, J. (2008). Bayesian models of cognition. In R. Sun (Ed.) The Cambridge handbook of computational cognitive modeling (pp. 59–100): Cambridge University Press.Google Scholar
 Griffiths, T., Chater, N., Kemp, C., Perfors, A., Tenenbaum, J. (2010). Probabilistic models of cognition: Exploring representations and inductive biases. Trends in cognitive sciences, 14(8), 357–364.CrossRefPubMedGoogle Scholar
 Griffiths, T., Lieder, F., Goodman, N. (2015). Rational use of cognitive resources: levels of analysis between the computational and the algorithmic. Topics in Cognitive Science, 7, 217–229.CrossRefPubMedGoogle Scholar
 Grush, R. (2004). The emulation theory of representation: Motor control, imagery, and perception. Behavioral and Brain Sciences, 27, 377–442.CrossRefPubMedGoogle Scholar
 Habenschuss, S., Jonke, Z., Maass, W. (2013). Stochastic computations in cortical microcircuit models. PLoS Computational Biology, 9(11), e1003, 037.CrossRefGoogle Scholar
 Hamming, R. (1950). Error detecting and error correcting codes. Bell System Technical Journal, 29(2), 147–160.CrossRefGoogle Scholar
 Hobson, J., & Friston, K. (2012). Waking and dreaming consciousness: Neurobiological and functional considerations. Progress in Neurobiology, 98(1), 82–98.CrossRefPubMedPubMedCentralGoogle Scholar
 Hohwy, J. (2013). The predictive mind. Oxford: Oxford University Press.CrossRefGoogle Scholar
 Hohwy, J., Roepstorff, A., Friston, K. (2008). Predictive coding explains binocular rivalry: an epistemological review. Cognition, 108(3), 687–701.CrossRefPubMedGoogle Scholar
 Horga, G., Schatz, K., AbiDargham, A., Peterson, B. (2014). Deficits in predictive coding underlie hallucinations in schizophrenia. The Journal of neuroscience, 34(24), 8072–8082.CrossRefPubMedPubMedCentralGoogle Scholar
 Jeffrey, R. (1965). The logic of decision. New York: McGrawHill.Google Scholar
 Jehee, J., & Ballard, D. (2009). Predictive feedback can account for biphasic responses in the lateral geniculate nucleus. PLoS Computational Biology, 5, 1–10.CrossRefGoogle Scholar
 Kant, I. (1999/1787). Critique of pure reason. The Cambridge edition of the Works of Immanuel Kant. Cambridge: Cambridge University Press.Google Scholar
 Kiiveri, H., Speed, T.P., Carlin, J.B. (1984). Recursive causal models. Journal of the Australian Mathematical Society Series A Pure mathematics, 36(1), 30–52.CrossRefGoogle Scholar
 Kilner, J.M., Friston, K.J., Frith, C.D. (2007a). The mirrorneuron system: a Bayesian perspective. Neuroreport, 18, 619–623.CrossRefPubMedGoogle Scholar
 Kilner, J.M., Friston, K.J., Frith, C.D. (2007b). Predictive coding: an account of the mirror neuron system. Cognitive Process, 8, 159–166.CrossRefGoogle Scholar
 Knill, D., & Pouget, A. (2004). The Bayesian brain: the role of uncertainty in neural coding and computation. Trends in Neuroscience, 27(12), 712–719.CrossRefGoogle Scholar
 Kostopoulos, D. (1991). An algorithm for the computation of binary logarithms. IEEE Transactions on computers, 40(11), 1267–1270.CrossRefGoogle Scholar
 Kullback, S., & Leibler, R.A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22, 79–86.CrossRefGoogle Scholar
 Kwisthout, J. (2009). The computational complexity of probabilistic networks. PhD thesis Faculty of Science, Utrecht University, The Netherlands.Google Scholar
 Kwisthout, J. (2011). Most probable explanations in Bayesian networks: complexity and tractability. International Journal of Approximate Reasoning, 52(9), 1452–1469.CrossRefGoogle Scholar
 Kwisthout, J. (2014). Minimizing relative entropy in hierarchical predictive coding. In L. van der Gaag, & A. Feelders (Eds.) Proceedings of PGM’14, LNCS, (Vol. 8754 pp. 254–270).Google Scholar
 Kwisthout, J. (2015). Treewidth and the computational complexity of map approximations in Bayesian networks. Journal of Artificial Intelligence Research, 53, 699–720.CrossRefGoogle Scholar
 Kwisthout, J. (2018). Approximate inference in Bayesian networks: parameterized complexity results. International Journal of Approximate Reasoning, 93, 119–131.CrossRefGoogle Scholar
 Kwisthout, J., & van der Gaag, L. (2008). The computational complexity of sensitivity analysis and parameter tuning. In D. Chickering, & J. Halpern (Eds.) Proceedings of the 24th conference on uncertainty in artificial intelligence (pp. 349–356): AUAI Press.Google Scholar
 Kwisthout, J., & van Rooij, I. (2013a). Bridging the gap between theory and practice of approximate Bayesian inference. Cognitive Systems Research, 24, 2–8.CrossRefGoogle Scholar
 Kwisthout, J., & van Rooij, I. (2013b). Predictive coding: intractability hurdles that are yet to overcome [abstract]. In M. Knauff, M. Pauen, N. Sebanz, I. Wachsmuth (Eds.) Proceedings of the 35th annual conference of the cognitive science society Austin, TX: Cognitive Science Society.Google Scholar
 Kwisthout, J., Wareham, T., van Rooij, I. (2011). Bayesian intractability is not an ailment approximation can cure. Cognitive Science, 35(5), 779–784.CrossRefPubMedGoogle Scholar
 Kwisthout, J., Bekkering, H., van Rooij, I. (2017). To be precise, the details don’t matter: On predictive processing, precision, and level of detail of predictions. Brain and Cognition, 112(112), 84–91.CrossRefPubMedGoogle Scholar
 Lee, T.S., & Mumford, D. (2003). Hierarchical Bayesian inference in the visual cortex. Journal of the Optical Society of America America, 20(7), 1434–1448.CrossRefGoogle Scholar
 Lieder, F., & Griffiths, T.L. (2019). Resourcerational analysis: understanding human cognition as the optimal use of limited computational resources. Behavioral and Brain Sciences. https://doi.org/10.1017/S0140525X1900061X.
 Littman, M.L., Goldsmith, J., Mundhenk, M. (1998). The computational complexity of probabilistic planning. Journal of Artificial Intelligence Research, 9, 1–36.CrossRefGoogle Scholar
 Maass, W. (2014). Noise as a resource for computation and learning in networks of spiking neurons. Proceedings of the IEEE, 102(5), 860–880.CrossRefGoogle Scholar
 Majithia, J.C., & Levan, D. (1973). A note on base2 logarithm computations. Proceedings of the IEEE, 61 (10), 1519–1520.CrossRefGoogle Scholar
 Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. New York: Freeman.Google Scholar
 Otworowska, M., Kwisthout, J., van Rooij, I. (2014). Counterfactual mathematics of counterfactual predictive models. Frontiers in Consciousness Research, 5, 801.Google Scholar
 Papadimitriou, CH. (1994). Computational complexity. Reading: AddisonWesley.Google Scholar
 Parberry, I. (1994). Circuit complexity and neural networks. Cambridge: MIT Press.CrossRefGoogle Scholar
 Park, J.D., & Darwiche, A. (2004). Complexity results and approximation settings for MAP explanations. Journal of Artificial Intelligence Research, 21, 101–133.CrossRefGoogle Scholar
 Pearl, J. (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference. Palo Alto: Morgan Kaufmann.Google Scholar
 Pearl, J. (2000). Causality: models, reasoning and inference. Cambridge: MIT Press.Google Scholar
 Pecevski, D., Bueling, L., Maass, W. (2011). Probabilistic inference in general graphical models through sampling in stochastic networks of spiking neurons. PLoS Computational Biology, 7(12), 1–25.CrossRefGoogle Scholar
 PinkHashkes, S., van Rooij, I., Kwisthout, J. (2017). Perception is in the details: a predictive coding account of the psychedelic phenomenon. In Proceedings of the 39th annual meeting of the cognitive science society (pp. 2907–2912).Google Scholar
 Rao, R., & Ballard, D. (1999). Predictive coding in the visual cortex: a functional interpretation of some extraclassical receptivefield effects. Nature neuroscience, 2, 79–87.CrossRefPubMedGoogle Scholar
 Rothen, N., Seth, A., Ward, J. (2018). Synesthesia improves sensory memory, when perceptual awareness is high. Vision Research, 153, 1–6.CrossRefPubMedGoogle Scholar
 Seth, A. (2015). Presence, objecthood, and the phenomenology of predictive perception. Cognitive neuroscience, 6(23), 111–117.CrossRefPubMedGoogle Scholar
 Seth, A., & Tsakiris, M. (2018). Being a beast machine: the somatic basis of selfhood. Trends in Cognitive Sciences, 22(11), 969– 981.CrossRefPubMedGoogle Scholar
 Seth, A., Suzuki, K., Critchley, H. (2011). An interoceptive predictive coding model of conscious presence. Frontiers in Psychology, 2, e395.Google Scholar
 Shimony, S.E. (1994). Finding MAPs for belief networks is NPhard. Artificial Intelligence, 68(2), 399–410.CrossRefGoogle Scholar
 Sterzer, P., Adams, R., Fletcher, P., Frith, C., Lawrie, S., Muckli, L., Petrovic, P., Uhlhaas, P., Voss, M., Corlett, P. (2018). The predictive coding account of psychosis. Biological Psychiatry, 84(9), 634–643.CrossRefPubMedPubMedCentralGoogle Scholar
 Stockmeyer, L. (1977). The polynomialtime hierarchy. Theoretical Computer Science, 3, 1–22.CrossRefGoogle Scholar
 Swanson, L. (2016). The predictive processing paradigm has roots in Kant. Frontiers in Systems Neuroscience, 10, 79.CrossRefPubMedPubMedCentralGoogle Scholar
 Tenenbaum, J.B. (2011). How to grow a mind: statistics, structure, and abstraction. Science, 331, 1279–1285.CrossRefPubMedGoogle Scholar
 Thagard, P., & Verbeurgt, K. (1998). Coherence as constraint satisfaction. Cognitive Science, 22, 1–24.CrossRefGoogle Scholar
 Thornton, C. (2016). Predictive processing is Turing complete: a new view of computation in the brain.Google Scholar
 Torán, J. (1991). Complexity classes defined by counting quantifiers. Journal of the ACM, 38(3), 752–773.CrossRefGoogle Scholar
 Tsotsos, J. (1990). Analyzing vision at the complexity level. Behavioral and Brain Sciences, 13, 423–469.CrossRefGoogle Scholar
 Van de Cruys, S., Evers, K., Van der Hallen, R., Van Eylen, L., Boets, B., de Wit, L., Wagemans, J. (2014). Precise minds in uncertain worlds: Predictive coding in autism. Psychological Review, 121(4), 649–675.Google Scholar
 van Rooij, I. (2008). The Tractable Cognition Thesis. Cognitive Science, 32, 939–984.CrossRefPubMedGoogle Scholar
 van Rooij, I., Blokpoel, M., Kwisthout, J., Wareham, T. (2019). Cognition and intractability: a guide to classical and parameterized complexity analysis. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
 Vaseghi, S. (2000). Advanced digital signal processing and noise reduction, 2nd. New Jersey: Wiley.Google Scholar
 von Helmholtz, H. (1867). Handbuch der Physiologischen Optik. Leipzig: Leopold Voss.Google Scholar
 Wagner, K.W. (1986). The complexity of combinatorial problems with succinct input representation. Acta Informatica, 23, 325–356.CrossRefGoogle Scholar
 Weilnhammer, V., Stuke, H., Hesselmann, G., Sterzer, P., Schmack, K. (2017). A predictive coding account of bistable perceptiona modelbased fMRI study. PLoS Computational Biology, 13(5), e1005, 536.CrossRefGoogle Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.