Testable or bust: theoretical lessons for predictive processing

The predictive processing (PP) account of action, cognition, and perception is one of the most influential approaches to unifying research in cognitive science. However, its promises of grand unification will remain unfulfilled unless the account becomes theoretically robust. In this paper, we focus on empirical commitments of PP, since they are necessary both for its theoretical status to be established and for explanations of individual phenomena to be falsifiable. First, we argue that PP is a varied research tradition, which may employ various kinds of scientific representations (from theories to frameworks and toolboxes), differing in the scope of empirical commitments they entail. Two major perspectives on PP qua cognitive theory may then be distinguished: generalized vs. hierarchical. The first one fails to provide empirical detail, and the latter constrains possible physical implementations. However, we show that even hierarchical PP is insufficiently restrictive to disallow incorrect models and may be adjusted to explain any neurocognitive phenomenon–including non-existent or impossible ones–through flexible adjustments. This renders PP a universal modeling tool with an unrestricted number of degrees of freedom. Therefore, in contrast with declarations of its proponents, it should not be understood as a unifying theoretical perspective, but as a computational framework, possibly informing further theory development in cognitive science.


Introduction
The predictive processing (henceforth: PP) approach to cognition (Clark, 2013) understands cognitive systems-either biological or artificial-as systems which continuously infer external causes of their sensory inputs. For this purpose, they implement prediction error minimization schemes in the form of internal hierarchical models streaming down predictions of sensory activations. Each time predictions and inputs do not overlap, the models either re-parametrize to accommodate prediction errors (which enhances their predictive accuracy in the long run) or, in a more active form of inference, prescribe action policies that make sensory data fit initial predictions (for recent primers on PP, see Hohwy 2020a; Wiese & Metzinger, 2017).
Initially a successful approach in the perception science (Clark, 2013), PP was recently extrapolated to a wide array of psychological and cognitive domains, such as interoception and emotion (Barrett, 2017;Seth, 2013), self and consciousness (Dołęga & Dewhurst, 2021;Seth & Hohwy, 2021;Woźniak, 2018), higher-order cognition (Pickering & Clark, 2014), psychopathology (Corlett et al., 2019;Sterzer et al., 2018), and many others (for a comprehensive list of philosophically-oriented research on PP, see Table S1 in Hohwy 2020a). As a result, PP became one of the most influential approaches to unifying research in cognitive science: The rapidly growing number of models classified as PP gives one the impression that it could be the theory in cognitive science (Clark, 2016). At the same time, the exact claims of PP remain unclear, as do the methods of testing PP empirically, and even whether that is possible at all (see, e.g., Cao 2020; Sun & Firestone, 2020).
The focus of this paper is on the empirical contents of PP. Importantly, the present inquiry does not stem from doubts about the scientific credentials of PP. We are not interested in the demarcation of science from pseudoscience, as Popper (1959) famously was. We assume that PP is scientific at the outset. Moreover, we are not worried that PP might comprise principles that are considered non-empirical, or a priori. 1 In the paper, we understand PP as a broad research tradition and discuss the whole spectrum of scientific representations under the PP label. Then we analyze them to identify those which can be assessed for falsifiability or testability. For the purpose of such assessment, we complement the Popperian understanding of falsifiability with the perspective of Taatgen (2003); that is, we expand the received notion of falsifiability centered on counterexamples to include the restrictive ability of theories that disallow incorrect models. Then, we argue why PP currently falls short on meeting requirements for a well-founded theory. In essence, the promises of theoretical unification, which are associated with PP, will remain unfulfilled unless the account provides a rich theoretical understanding. We consider falsifiability as a proxy for the ability of theories to provide an understanding of empirical phenomena.
The paper proceeds as follows. In the first section, we argue that PP is best understood in terms of a fairly large and varied research tradition, which includes several distinct theories and a number of distinct computational implementations. We show that scientific claims of PP are testable on various scales of abstraction, from general assumptions of this tradition to models of individual phenomena. In the subsequent section, we distinguish two major perspectives on PP qua cognitive theory: the generalized PP, which fails to provide crucial empirical detail, and hierarchical PP theories, which constrain possible physical implementations. Next, we argue that if hierarchical PP is tasked with providing a unified theory of cognition, it cannot be treated as a computational framework-a universal modeling tool with an unbounded number of degrees of freedom. We embrace Niels Taatgen's insight that cognitive theories may have two problems: They can generate incorrect models, which actually describe what is empirically impossible, and they can have counterexamples, or phenomena which are not predicted by a theory. The latter category is easier to address by ad hoc extensions in contrast to the first one: If a theory is so flexible that it can predict just anything, it carries no empirical information. As we show, this is currently the case with PP, which therefore remains a mere computational framework, easily confused with a unifying theory. This framework may turn out unproductive for providing a deeper understanding of phenomena, unless current modeling practices are revised.

PP and various kinds of scientific representations
In this paper, we view PP as a research tradition. The term research tradition was introduced by Larry Laudan, who refined the notion of a scientific program coined by Lakatos (1970). As Lakatos (1970) observed, research programs are difficult to falsify, but they can be assessed as progressive or degenerate. However, his account of research programs requires that they be clearly divided into their immutable hard core, which defines their identity, and the "protective belt" of easily adaptable auxiliary hypotheses. But even for research programs studied by Lakatos, the existence of their immutable hard core is hardly plausible (Laudan, 1977). This is why we follow Laudan's proposal to analyze historical dynamics of research in terms of research traditions, and we see PP as a research tradition. Some bet that it is progressive (Clark, 2013(Clark, , 2016Hohwy, 2020a;Seth & Hohwy, 2021), yet there are symptoms of stagnation (Cao, 2020;Litwin & Miłkowski, 2020;Sun & Firestone, 2020).
Research traditions are characterized by three features: (a) "Every research tradition has a number of specific theories which exemplify and partially constitute it" (Laudan, 1977, p. 78); (b) they exhibit "certain metaphysical and methodological commitments which, as an ensemble, individuate the research tradition and distinguish it from others" (ibid.); (c) lastly, traditions go through a number of formulations and usually have a long history (ibid., p. 79). Let us apply this definition to PP: There are several distinct specific theories that constitute this tradition. Clark (2016) defended the view that, even though PP originated from classical representational cognitive science, it embraces crucial insights of extended and embodied approaches since cognitive control is largely realized through active inference. In contrast, Hohwy (2013Hohwy ( , 2016) defended a fairly internalist, neurocentric, and representationalist version of PP. Finally, there is a FEP-based version, dubbed Hierarchically Mechanistic Mind, defended by Karl Friston and his collaborators as a proper model of neural architecture . These three theories are quite distinct and sometimes even mutually exclusive: Hohwy's internalism may be difficult to reconcile with Clark's externalism, and is utterly incompatible with radical externalist versions of PP (Bruineberg et al., 2016;Kirchhoff & Robertson, 2018). At the same time, all these theories may be considered part of the broader research tradition, that of Bayesian accounts of cognition (Chater et al., 2010). Multiple traditions may cross in multiple ways, conspiring to create multiple variants of similar approaches.
Proponents of PP refer to it in various ways, interchangeably calling it a theory (Clark, 2013, p. 186), a framework (Clark, 2013, p. 201), but also a model (Clark, 2013, p. 181). Hence, it remains difficult to understand exactly what is meant when one talks of testability or falsifiability of PP simpliciter. Let us distinguish several kinds of scientific representations in computational (neuro)cognitive science to clarify this issue. First, there are computational toolboxes that come almost without theoretical glosses: Think of a MATLAB module. These can include various implementations of distinct predictive coding algorithms (Spratling, 2017). Second, there are computational frameworks, which can be used to produce particular models: Think of dedicated programming languages or generic cognitive architectures. Third, there are theories that can be stated in more or less abstract fashion, in terms of verbal descriptions, flowcharts, equations, axioms etc. These usually embody general principles used to systematically understand a domain of phenomena, which are then explained by producing particular models.
There are multiple many-to-many mappings between these entities: There could be various PP algorithms, different implementations of the same PP algorithm for particular computational platforms, and the same algorithm or its implementation could be used by researchers who subscribe to conflicting theories. Alas, this implies that the proliferation of PP models cannot be easily understood as the growing prominence of a single theory. The sheer number of combinations of these various kinds of PP representations may contribute to a combinatorial explosion of diverse approaches within a research tradition. For example, the PP algorithm known as "active inference" may underlie models developed under divergent assumptions, e.g., (1) in the internalist take on PP, active inference is just yet another part of a wider inferential process carried out by the brain (Hohwy, 2016), whereas (2) in the enactive approach to PP, active inference is all that the generative model-as a control system-ever does (Ramstead et al., 2020). Active inference is also (3) the main ingredient of a separate process theory that aims to explain neural dynamics as well as learning and behavior of living organisms . In addition, active inference comes in several mathematical formulations because the framework is constantly evolving (Da Costa et al., 2020).
Importantly, these various kinds of scientific representations in computational cognitive (neuro)science may have different empirical commitments. While toolboxes may carry little to no theoretical interpretation, the tool development may influence theory development (Gigerenzer, 1991(Gigerenzer, , 1992. In the case of PP, there are several ways that tools impact theories: Cognitive processes are easily modeled in terms of top-down inferential processes that depend on generative models and prediction errors. Consider a modeler's idea that it might be useful to model a given mental disorder with the use of a PP algorithm. The modeler must only "fill in the blanks" (i.e., define the nodes and the ways to perturb the inferential process) to get a preliminary understanding of the cognitive phenomenon in question: The task of the researcher is to re-describe a phenomenon (e.g., a psychiatric disorder) in terms of the toolbox (e.g., a generic PP algorithm). As such, however, toolboxes need not be realistically understood and are not directly testable; only individual models can be tested. Moreover, tools may be used in a non-standard fashion and need not conform to theoretical constraints. For example, one could use original predictive coding algorithms (Cutler, 1952) to compress video files and ignore the neuroscientific uses of their later incarnations.
Frameworks carry a bit more theoretical burden. Consider that Allen Newell characterized them as "conceptual structures (often computer systems) that are reputed to be relatively content free, but that permit specific cognitive theories to be inserted in them in some fashion" (Newell, 1990, pp. 16-17). The relative lack of content implies also that they are theoretically thin. However, in contrast to tools, there are certain research exemplars one could follow to produce standard cognitive theories. For example, nosological descriptions of psychopathologies may be formalized as various forms of computational failures within the generic PP scheme (Friston et al., 2014), which creates a PP framework that may be used to generate computational models of individual mental disorders or psychopathological phenomena (e.g., Powers et al., 2017). Such a computational framework is not restricting its uses, and it's up to the modelers' choice to interpret parts of the model in theoretical (e.g., neurobiological or functional) predictive coding terms. Thus, what is empirically tested, again, are particular cognitive models rather than frameworks, which can be treated instrumentally and interpreted in various ways. The elbow room for such interpretations is relatively large and by itself, the PP framework may have little significance for empirical testing. However, it may be still assessed for computational theoretical constraints such as computational tractability (Kwisthout & van Rooij, 2020), even if its interpretations are ambiguous.
In contrast, theories wear their commitments on their sleeves. While any theory goes beyond mere empirical content and remains empirically indeterminate, it should be at least partially falsifiable. The issue of falsifiability and testing in general is further complicated by the Duhem-Quine thesis, which states that no single theoretical proposition can be separately verified or falsified empirically. This is because theories are considered to be structures that are tested as complete wholes.
In the case of cognitive (neuro)science, the problem is even more grave because it is not at all clear what cognitive theories are and how they are represented. Only some are stated in natural language (e.g., the theory of thinking through metaphors: Lakoff & Johnson, 1980), and others are better represented by flow-charts or diagrams with verbal comments (e.g., the filter model of attention, Broadbent, 1958), by mathematical descriptions (e.g., the account of vision, Marr, 1982) or in terms of formal specification languages (for an extended analysis, see: Cooper & Guest, 2014). Devoid of clear identity, they easily mesh with background assumptions of other theories. Flowcharts or diagrams make their interpretation in simple propositional terms difficult. And what is yet more problematic is that it is virtually impossible to distinguish theoretical assumptions from implementation details. This is a notori-ous problem for computational models in cognitive (neuro)science (Cooper & Guest, 2014;Cooper & Shallice, 1995). Mere implementations of computer code are not cognitive theories. These implementations leave theories highly underspecified: It is unclear which parts of code should one include and which are just helper modules or ad hoc additions (Frijda, 1967). This underspecification is particularly true of predictive coding, which has several extant implementations. For example, only some of them presume that neurons encode Gaussian probability distributions. But is this a theoretical assumption or simply part of the implementation of predictive coding? Even proponents of PP differ in their treatment of some such cases (Spratling, 2013).
As we observed elsewhere (Litwin & Miłkowski, 2020), all this contributes to unclear theoretical fundamentals of PP. Not only there are many distinct versions of PP theory, but even a single version may be difficult to interpret in non-mathematical terms. For example, it remains unclear what psychological process or property is supposed to correspond to the mathematical property dubbed "precision" (ibid.). These difficulties notwithstanding, these theories do have empirical commitments (as we show below) and, in spite of all their indeterminacies and ambiguities, they feature several assumptions that are empirically falsifiable.
Finally, PP models address individual phenomena. These models are computational (even if only verbally sketched), and as such, they may play explanatory or predictive roles, which is typical for cognitive (neuro)science (Miłkowski, 2013; Piccinini, 2020). As Morrison & Morgan (1999) noted in their analysis of models in science, these could function and be constructed autonomously from theories. This is typical for PP models that cannot be derived entirely from the underspecified theory. One could even doubt whether the growth of the PP research tradition is theorydriven, and suggest that the whole tradition should be understood as an organically growing collection of particular models. This doubt could go hand in hand with the rejection of unificatory ambitions of some PP defenders: One could embrace particular PP models without assuming that these could become the unified theory of cognition any time soon.
In our view, the development of particular models and refining of theoretical principles are best intertwined, creating a creative tension that could spark further scientific progress. As is usually the case, theoretical underpinnings are difficult to establish and revise, while pragmatic considerations drive modelers towards repurposing the theoretical instruments toward their new ends. This may, however, lead to ambiguities in how the fundamental theoretical constructs are understood and what the theory actually states. Unfortunately, this may contribute to the demise of the whole theoretical edifice in the flurry of modeling work.

The protean nature of PP
The core issue for falsifying a given theory is to establish its empirical commitments. This is not at all easy for PP. "Predictive processing" is used to refer both to a broad class of "top-down" theories of cognition and to more specific theories based on hierarchical predictive processing posited to occur in the cortex of the brain, which is supposed to implement or approximate some form of Bayesian inference.
The first class was characterized as "generalized predictive coding" (Bastos et al., 2012). In a nutshell, "generalized predictive coding" refers to any learning scheme that updates the way it generates its output values in response to observed errors in the previously generated values. This happens if and only if expected values generated from the extant data model (usually dubbed "generative model") can be compared against some ground truth, and then drive corrections in the model (in the Appendix, we provide more mathematical details about this procedure). For example, this is the case for any kind of gradient descent learning in AI, but also for symbolic machine learning methods. Indeed, the notion of generalized predictive coding is broad: As Bastos et al. note, "It grandfathers nearly every known statistical estimation scheme, under parametric assumptions about additive noise" (Bastos et al., 2012, p. 703). As such, it covers emulation models based on Kalman filters, proposed by Grush (2003), but also any model-driven cognitive capacity modulated by feedback errors, including, for example, model-based reinforcement learning (Dayan & Berridge, 2014). In this, PP simply cannibalizes most top-down accounts of cognition, because they are predominantly model-based and require error corrections. But by itself, this version of PP also brings little, if any, new theoretical insights about cognition.
But is this version falsifiable? Let us take generalized PP to be a unified theory of cognition: This theory can be falsified by a single counterexample to the claim that cognition requires generative models that are adapted over time by applying corrections. Take animal taxes as driven by the increasing magnitude of tracked sensory stimuli, but devoid of any generative models. For example, crickets follow the sounds made by their conspecifics, but it is unlikely that they use generative models in their phonotaxis, even if their sensory processing is generally modulated by forward models (Webb, 2004). All that sensory registration requires is that the cricket is differentially sensitive to sound (Burge, 2010), which implies the cricket's sensitivity to the information about the stimulus. Under the received interpretation, bottom-up signals are stimulus-related, for example, indicating the position of a chirping male to a female cricket. As they are acquired by the female nervous system, they make some specific difference to the downstream neural firing (i.e., Shannon information carried by sensory signals must make some causal difference if the stimulus is received). Additionally, the received interpretation includes the existence of feedback top-down modulation, which is necessary, for example, for the usual perceptual phenomena, such as habituation, to occur. Thus, there must be both bottom-up and top-down signal pathways in the cricket auditory system. A defender of the PP account of this perceptual architecture might, however, switch the labels on pathways, by saying that the bottom-up pathway carries error information related to the predicted cause of the sound, i.e., the position of a male chirping cricket, and the top-down pathway provides predictions of future error input based on previously received error information. Then, the empirical evidence that allows us to ascribe cricket neurons the function to carry information about the sound is also evidence for the generalized PP reinterpretation. This move allows a PP theorist to cast their net very wide, unfortunately at the risk of trivializing the theory (Cao, 2020).
But there is a second way to think of PP. To understand that it is much more constrained, one can simply supplement generalized predictive coding with a formal specification of hierarchical message passing and stipulate that PP formal constructs correspond to (roughly) Bayesian concepts such as precision and conditional expectation about (hidden) causes. Hierarchical PP cannot specify one particular algorithm, as there are many (Spratling, 2017). Importantly, the predictive coding scheme as envisioned by PP necessarily entails hierarchical organization of the inferential machinery, as nicely summarized by the defenders of the Hierarchically Mechanistic Mind (HMM) hypothesis: In particular, the HMM relies on the directly testable second-order hypothesis that the brain minimises prediction error via hierarchical message passing in the brain (i.e., predictive coding …), which has already been demonstrated experimentally by studies of visual processing (Badcock et al., 2019, p. 115).
While there are non-hierarchical predictive coding algorithms, as with linear predictive coding in signal processing, these are not normally considered to be part of the core PP theory of cognition (e.g., Clark 2016). Thus, the overall idea of the hierarchical PP theory is that the brain implements one of the predictive coding algorithms in a bidirectional hierarchical manner, where top-down signals represent predictions, bottom-up signals represent prediction errors, and two functionally distinct units represent predictions and prediction errors at each level of the inferential hierarchy. In other words, this definition does not include all top-down accounts of cognitive processes. There remain relevant explanatory alternatives to be considered when assessing PP explanations.
This flavor of the PP theory is more empirically informative, but it also comes at a cost. To see this, it is instructive to compare it to other theories that hypothesize propagation of error information in the brain. Initially, the proposed mathematical formalism of PP could not be mapped onto concrete neural mechanisms (Kogo & Trengove, 2015). This problem was later addressed (Keller & Mrsic-Flogel, 2018), but the proposed solution imposes additional commitments upon the architecture of canonical microcircuits (the existence of type I and type II prediction error neurons encoding, respectively, positive and negative prediction errors) while maintaining certain biologically implausible features (Spratling, 2019). Similarly, in some (but not all!) respects, hierarchical PP is less biologically plausible than other approximations of the back-propagation algorithm: Predictive coding implies implausible neuronal connectivity, such as one-to-one connections of error nodes to their corresponding value notes, which is "inconsistent with diffused patterns of neuronal connectivity in the cortex" (Whittington & Bogacz, 2019, p. 240). Finally, extant PP algorithms assume that direct communication occurs only between adjacent levels in the inferential hierarchy (Litwin & Miłkowski, 2020), and top-down connections are necessarily inhibitory (Denève & Jardri, 2016), which also seems questionable given existing knowledge on heterarchical functional anatomy of the brain (Bechtel, 2019;Pessoa, 2017Pessoa, , 2019. We do not refer to the above examples to pronounce the verdict that PP is thereby disproved. What we want to stress here is that hierarchical PP is not devoid of empirical content, and, as such, may be a subject to theoretical debate, cross-model comparisons, and empirically-informed revisions generating novel predictions. For example, one may argue that top-down connections can exert excitatory influences through the suppression of inhibitory interneurons at the lower level (Kanai et al., 2015;Keller & Mrsic-Flogel, 2018). The upshot is that, while as yet we have no unequivocal neurophysiological evidence for or against hierarchical PP (Walsh et al., 2020), there are possible counterexamples that it should face, which in principle renders it a falsifiable theory.

Explaining just the right thing
If hierarchical PP has essential empirical commitments, it should be understood as a theory rather than a computational framework. Then, the question is whether PP has the required virtues of the theory, one of which is empirical testability. In particular, PP may be assessed for falsifiability, which we understand in the Taatgenian (2003) sense: The problems of a theory can be found in two categories: counterexamples, phenomena that are possible in reality but are not predicted by the theory, and incorrect models, predictions of the theory that are not possible in reality. The issue of incorrect models is especially important, because an unrestricted Turing Machine is potentially capable of predicting any conceivable cognitive phenomenon (Taatgen, 2003, p. 622).
The traditional ("Popperian") approach to falsifiability is to give counterexamples, that is, phenomena that a theory cannot easily explain. Accordingly, existing critical positions focus on individual cognitive phenomena problematic for PP, such as motivation a.k.a. "the dark room problem" (Klein, 2018;Sun & Firestone, 2020), thought (Williams, 2020) or delusional beliefs (Williams, 2018). However, even if PP is incorrect in particular cases, such incorrectness is not exactly telltale. After all, no other cognitive theory can explain everything-defenders of PP can readily admit that this also applies to PP (see, e.g., Seth et al., 2020), and respond that the coverage of phenomena explained grows dramatically over time. While we argued elsewhere that the growth in question is only apparent (Litwin & Miłkowski, 2020), we do not claim that there is absolutely no growth in coverage; it is only much slower than usually touted.
Incorrect models, even though they have been out of the spotlight in the context of falsifiability, are much more important than counterexamples, because they show that a theory is insufficiently restrictive to disallow false models. Such an overly expressive theory shares the fate of an unrestricted universal Turing machine, since it can produce any model -also for cognitively impossible phenomena. There is also a danger in overfitting already obtained data: A theory which merely accommodates any possible result, and morphs as soon as we gather new information, has little-to-no predictive value. Consider the following example from the literature: The statistical independence between the identity and location of objects in the visual world suggests an anatomical dissociation between models or representations of the "what" and "where" attributes of (hidden) causes of visual input (i.e., knowing what an object is does not tell you where it is). This is precisely what we see in the distinction between the ventral ("what") and dorsal ("where") streams in the cortical hierarchy , p. 1330 The dissociation between ventral and dorsal streams is frequently cited as an example of adaptive functional segregation that mirrors the causal structure of the environment, as envisioned by PP (Friston, 2013;Friston & Buzsáki, 2016). It is also argued that this neuroanatomical solution improves the computational efficiency of the inferential hierarchy. The explanation has an immediate intuitive charm; however, to mitigate the risk of its being incorrect, PP should also impose in advance clear theoretical constraints on the number, reducibility, and degree of independence of factors present in the sensory input. The number of specialized sensory pathways or areas is possibly smaller than the number of orthogonal factors in the sensory stream. If theoretical claims are tailored to the evidence post hoc, a remarkable fit between what the theory would expect and data can easily disappear as soon as new empirical evidence arrives.
In their recent paper, Pitcher & Ungerleider (2021) provide neuroanatomical evidence that the two-pathways model of visual processing is outdated, and they call for its revision. They describe another anatomically segregated pathway whose functional properties distinctively pertain to dynamic aspects of social perception, such as the recognition of bodily movement and facial expressions. This is obviously problematic for the original PP claim. Hindsight is 20/20, and, certainly, the existence of three visual pathways, dedicated separately to statistically independent "what," "where," and "[social] who," is perfectly reasonable. However, even though the list of independent factors in the visual input may always be supplemented by new additions, this process reveals incorrect models along the way and exposes the unidirectional revision dynamics (post-hoc fitting without new predictions or a priori constraints) of PP.
There are also reasons to believe that the "statistical independence" hypothesis is in general incorrect. The third visual pathway shares input properties (representations) with the ventral ("what") pathway as it also processes bodies, faces, and motion. And it does so independently: Pitcher & Ungerleider (2021) provide an example of prosopagnosic patients whose superior temporal sulci (STS) exhibited regular faceselective responses despite lesions in face-selective areas in the ventral pathway. The existence of two decoupled yet co-selective areas in distinct visual pathways breaks the link between statistical independence and functional specialization. It also questions whether simple factorization aptly captures functional nuances: It is clear that while 'what', 'where', and 'how' can describe the many facets of visual object recognition, these terms are wholly inadequate when it comes to describing the complexity and nuances of even basic social interactions. There is no simple one-word description that can encompass the functions of the third visual pathway. Rather, it appears that the visual input into the STS is integrated with other sensory inputs to enable primates to understand and interpret the actions of others (Pitcher and Ungerleider 2021, p. 9) Finally, the identity and location of an object are statistically independent irrespective of the way they are detected, but this functional segregation does not seem to be true of all sensory modalities. Two existing olfactory pathways are not those of "what" and "where," but of sensing the environment and of finding a receptive mate (Firestein, 2001). Three distinct processing streams have been also proposed in the somatosensory domain, although their functions remain to be defined (Saadon-Grosman et al., 2020).
The two visual pathways "hypothesis" leads astray irrespective of the angle we scrutinize it from, and illustrates a more general PP propensity for the virtually unrestricted generation of incorrect models. This propensity is not limited to the domain of neural implementation: While PP specifies a universal algorithm of unsupervised learning, applicable to a wide variety of cognitive systems, it does not put meaningful constraints on the behavioral and cognitive patterns that can evolve. According to the complete class theorem, any decision can be cast as Bayesian given a proper loss function and a set of prior beliefs, at least for finite probability spaces (Robert, 2007). The complete class theorem was repeatedly argued (e.g., Friston, 2017;Parr et al., 2018) to be a foundation for countless PP insights into abnormal behaviors as "idiosyncratic rationalities": (...) there is always a set of prior preferences that renders any behaviour (approximately) Bayes optimal. At first glance, this may seem disappointing; however, turning the argument on its head, the complete class theorem means that we can always characterise behaviour in terms of prior preferences (Friston et al. 2016, p. 876) 2 However, by turning the argument on its head, we may render it invalid, which seems to be the case here. The complete class theorem is a recipe for an unrestricted modeling tool rather than informative theoretical contributions (Jones & Love, 2011;Bowers & Davis, 2012). Deriving models from individual datasets may result in non-generalizable models, biased estimates, and erroneous conclusions, especially if there can exist multiple distinct prior-cost function pairs yielding equally optimal fits (Friston, 2011). Flagging all (possible) behaviors as rational under very particular circumstances opens plenty of flexible re-parameterization options, which weakens the theory (cf. Taatgen, 2003), especially if we also consider that prediction error minimization is detached from temporal constraints of the system's performance. PP operates over appropriately long-term timeframes (cf. Hohwy, 2020a;Seth et al., 2020), which means that any locally suboptimal behavior can appear optimal from an appropriately extended perspective.
By means of complete class theorem, PP may even necessitate needless reconceptualizations of relatively well-understood phenomena (Jurjako, 2022). Let us consider self-deception, which is a motivationally biased process of forming beliefs, in which they evolve in the direction opposite to what is provided by evidence. PP takes such cases as instances of idiosyncratic rationalities (given a unique history of internal model development), doing away with a palpable phenomenal distinction between beliefs and desires just because it does not differentiate between them at the algorithmic level (Jurjako, 2022;Williams, 2021). A variety of forms of motivated and unmotivated reasoning is aggregated under a homogenous category of optimality. Such a broad category lacks explanatory force, does not offer valuable insights (for a related point, see Sun & Firestone 2020), and is actually counterproductive, as it trivializes the explanations that we already have (for a comprehensive review of many different shades of self-deception, see Butterworth et al., 2022).
Summing up, PP can potentially generate a vast set of incorrect models. Its enormous flexibility results in a pitfall of "predicting" phenomena as envisioned by the current scientific consensus, including those that will turn out (or have already turned out) to be incorrect. Moreover, any behavior may be rendered optimal given a proper temporal horizon or prior-cost function coupling. Thus, hierarchical PP lacks the tools for the prior assessment of model plausibility and appears to be a computational framework under the guise of a theory. It shares the fate of other computational frameworks that are not empirically informative in themselves, and even a good fit to evidence does not necessarily make a framework valuable (Roberts & Pashler, 2000). Frameworks may be productive if they inspire novel predictions and new lines of research, or they may be unproductive if they generate models in constant need of ad hoc patching and offering few scientific insights (Griffiths et al., 2012). It is for PP modelers only to decide their ways.

Conclusion
Although relatively recently spawned, predictive processing is already a vast research tradition, expressed in distinct scientific representations-spanning from particular models, through (more or less general) theories to computational frameworks and toolboxes. To determine its theoretical status, we focused on the testability and empirical commitments of two main approaches to PP qua theory: its generalized and hierarchical versions. Generalized PP is broad enough to encompass any modelbased accounts, and as such does not yield many informative insights. Hierarchical PP, on the other hand, does provide some preliminary constraints on possible implementations, which makes it a better candidate for an overarching theory in cognitive science, as frequently advertised.
As we have shown, in contrast with what its proponents declare, hierarchical PP is currently used as a mere framework: a computationally universal yet theoretically agnostic modeling tool. It can hardly be empirically disproved as it may be flexibly adapted to provide any explanation, also for phenomena which actually do not (or cannot) exist. This is because hierarchical PP lacks essential detail regarding the functioning of cognitive systems, and particular PP models can be easily customized through ad hoc assumptions and parameter adjustments. While this is common malpractice in computational modeling in cognitive (neuro)science, we urge PP defenders to fill in the important details of their grand theory. Otherwise, it may turn out to be just a series of uninformative platitudes, which, even if falsifiable, would not offer satisfactory new understanding in cognitive science.
Focusing on testability allows us also to point out that more attention should be paid to the refinement of the hierarchical message passing theory, because this is what makes PP experimentally contentful. Moreover, models should explicitly specify the properties of explananda in a theoretically-motivated fashion, and provide comparisons with alternative models. These standards may seem difficult to fulfill in the modeling practice, but failure to do so may be detrimental for the whole research tradition. In short, where PP needs more work is not exactly in providing more mathematical finesse, but in developing its theoretical underpinnings in a robust fashion.

Appendix
The underlying idea of generalized PP may be summarized by sketching its algorithm in neural network terms, adapted from (Whittington & Bogacz, 2017, p. 1243. Bars above variables denote vectors (e.g., x ). The output layer of the artificial neural network is 0, and the input layer l max . The training sample is denoted by pairs of training vectors s in and s out , which are iteratively presented to the network.
for all Data do (2) where θ (a) i,b are weights of synaptic connections and f' is a derivative of a certain non-linear node activation function f. The change in synaptic weight is given in 3; variables in steady state, i.e., after convergence is achieved, are denoted with an asterisk (e.g., F*): While the pseudocode and all three equations were put forward in the neural network context (a proposal for a predictive coding approximation of the backpropagation algorithm with local learning rules), they can be easily understood more broadly for any machine learning scheme based on gradient descent. Generally, any array data structure will do as well, and the core of the setup consists of chosen functions for updating variable values in an array over time (e.g., even the ordinary least squares method for linear regression can be implemented this way). An artificial neural network operating in accordance with this description learns a generative model of the data, conditioned on the input and encoded in the connection weights between the layers.