1 Introduction

The prediction error minimization framework (henceforth, PEM) aims to offer a unified explanation of the processes underlying perception and cognition by postulating that all cognitive functions can be reduced to one kind of process: minimizing prediction error. It has recently been popularised in the philosophical literature by Hohwy (2013) and Clark (2013, 2016), and has been increasingly gaining traction in both empirically informed philosophy of mind and cognitive (neuro)science. One of the reasons for PEM’s widespread adoption is the claim that the framework can deliver a unified theory of all cognitive functions. In the words of Clark, it is “the first truly unified account of perception, cognition, and action” (Clark 2016, p. 2). Similarly, Hohwy claims that PEM is “meant to explain perception, and action, and everything mental in between” (Hohwy 2013, p. 1). Unsurprisingly, the lofty unificatory and explanatory ambitions of the framework have come under much scrutiny (Colombo and Hartmann 2017; Gładziejewski 2019). However, there has been relatively little focus on the fact that, in order to offer a truly exhaustive account of our mental lives, PEM must also face the problem of explaining the processes and mechanisms underlying consciousness.

Our aim in this paper is to assess the (currently) most popular proposal for accounting for consciousness under the PEM framework and identify problems with its underlying assumptions, which we think can be resolved by applying a deflationary interpretation drawn from the work of Daniel Dennett and other like minded philosophers and scientists (e.g. Dennett 1991; Humphrey 2011; Graziano 2013; Pereboom 2011; Frankish 2016a, b; Dołęga and Dewhurst 2019). Our approach will show how consciousness can be accommodated by the PEM framework, while also resolving some long-standing philosophical issues surrounding that topic.

In Sect. 2 we will introduce some basic aspects of the PEM framework, without going into more detail than is strictly necessary for our argument. These aspects are prediction error (2.1), generative models (2.2), active inference (2.3), and precision optimisation (2.4). We will then present Jakob Hohwy’s proposal that conscious experience is determined by the current winning hypothesis (2.5), before moving on in Sect. 3 to identify some issues with this proposal. We will make explicit a representational assumption lying behind the implicit relationship between the properties of conscious experience and the properties of generative models, which leads to two problems—the problem of unconscious representation (3.1) and the problem of unconscious perception (3.2). As we argue, both of these related issues threaten to trivialize or severely limit the scope of Hohwy’s account of consciousness. In Sect. 4 we will present Daniel Dennett’s approach to studying consciousness and the resulting multiple drafts model (especially the notion of a ‘probe’ that determines the content of consciousness, discussed in 4.3), before arguing that it is a natural fit for the PEM framework, and that it offers a deflationary solution to both problems. One consequence of applying the multiple drafts model to PEM is that it urges the framework’s proponents to adopt a somewhat revisionary attitude towards our understanding of consciousness, as it will turn out that we may be systematically mislead about the nature and richness of our conscious experiences (4.4). We conclude by considering some of the further implications of this approach for the future development of the PEM framework.

2 Basic tenets of the prediction error minimization framework

PEM aims to completely overhaul the way we think about the organization and the functioning of the brain. According to this framework, the brain does not process information by accumulating raw sensory data and processing them into ever more detailed and complex representations of the world (as proposed, for example, by Marr 1982). Rather, it postulates that the nervous system operates by generating anticipatory hypotheses about the environmental causes of its future inputs, which are then tested against the actual states of the sensory surfaces and updated accordingly. This is accomplished by a hierarchically organised structure that is driven in a top-down manner, with each level predicting the activity of the level below.Footnote 1 Predictions are made based on a ‘generative model’ of the external world, and cascade down the hierarchy, constraining activity on each of the levels below. These predictions terminate in the sensory periphery, where they are ‘tested’ against sensory inputs, with any error being passed back up the hierarchy in order to correct the model. In this section we will briefly review some of the most important features of the framework,Footnote 2 before considering the leading proposal for how it might try to account for consciousness.

2.1 Prediction error

Unlike in traditional models of cognition and early sensory processing, only information that is inconsistent with the downward flow of predictions (i.e. prediction error) is propagated up the hierarchy for further processing (Rao and Ballard 1999; Friston 2010). This error signal carries information about the divergence between the predicted pattern of sensory stimulation and the actual state of the sensory periphery, and is propagated back up the hierarchy until it reaches a level at which it can be integrated into the system’s future predictions (Clark 2013, p. 187). This process of revising and updating the current generative model can be seen as a form of implicit, on-line learning, and begins as soon as prediction error is fed up the hierarchy. The process takes place over multiple levels and time scales, offering a high degree of plasticity (continuous revision) and efficiency (through its anticipatory nature) in real time processing.

2.2 Generative models

One of the notions which lie at the heart of the PEM proposal is that of a generative model. Clark (2013) summarizes the role that these models play in the framework as follows:

A generative model […] aims to capture the statistical structure of some set of observed inputs by tracking (one might say, by schematically recapitulating) the causal matrix responsible for that very structure. […] In practice, this means that top-down predictions within a multi-level (hierarchical and bidirectional) system come to encode a probabilistic model of the activities of units and groups of units within lower levels, thus tracking […] interacting causes in the signal source, which might be the body, or the external world […] (Clark 2013, p. 182).

Therefore, the system not only learns the regularities in the sensory input, but also infers abstract rules governing such regularities. By building a statistical model of the environment and capturing the structure of the underlying ‘causal matrix’ (Clark 2013, p. 182) responsible for the sensory data the system can efficiently predict its input. This can be done by a variety of statistical methods such as, for example, ‘empirical Bayes’, in which priors (i.e. the statistical ‘beliefs’ or expectations which form the basis for the top-down predictions) are initially extracted from the regularities in the bottom-up signal and refined with consecutive inferences (Friston and Kiebel 2009), or variational Bayesian methods (e.g. Mathys et al. 2011, 2014), in which the problem of inferring the true causes of sensory stimulation is reformulated as one of optimizing the parameters of the internal model so as to minimize some objective function (in this case the divergence between the predicted and actual state of the sensory periphery). The exact method used does not matter for present purposes, although ultimately a plausible mechanism will have to be proposed for its neural implementation (cf. Milkowski 2016).

2.3 Active inference

Passively performing inferences and updating the generative model is, for many purposes, not enough to accommodate incoming prediction error. As Hohwy notes: “a system without agency cannot minimize surprise but only optimize its models of the world” (2012, p. 3). Thus, the system needs to also perform active inference (Clark 2013; Friston et al. 2009; Hohwy 2012), in which it not only predicts the changes in its own states, but also engages in prediction about how its interactions with the environment will affect the inputs that it receives. Thus, in addition to inferring how the world is, the system also infers action policies that it expects will be best at minimising prediction error over long time scales. These action policies can then generate real-world behaviours in a way similar to ideomotor accounts of motor control, such as the one proposed by Wolpert and Flanagan (2001; cf. Wiese 2017, though see Pickering and Clark 2014, as well as Adams et al. 2013, for important differences between forward- and generative-model based accounts of motor control). In other words, the system can actively intervene on its environment in ways that aim to minimize the ‘surprising’ signals. This can either be done by a selective sampling of the surroundings aimed at obtaining more data (e.g. eye saccades), or by manipulating the environment in a way that minimizes signal uncertainty and brings the surroundings into a tighter fit with the current model (cf. Friston et al. 2009). The latter offers a very general mechanism for action control, as any behaviour of the system can be reconceived as aiming to minimize prediction error over the long term, and is therefore a crucial ingredient in the PEM framework’s claim to unify perception and action under one model.

2.4 Precision optimization

One problem faced by a PEM system is that not all prediction errors will be the result of inaccurate predictions about the world, but rather some may simply be caused by volatility in the environment or noise in the system’s sensory channels. Thus, a PEM system needs to account for uncertainty in the error signals it receives, and aim to minimize only those that constitute a reliable learning signal. PEM proposes to alleviate this problem by estimating the expected precision of the prediction errors (usually formalized as the inverse of their variance), which is used to weight different error signals according to their reliability. The precisions of prediction errors are context dependent, i.e. the precision that can be expected from different sensory modalities and in different environmental conditions will vary, and therefore need to be learned over time. This suggests the need for an additional, domain general control mechanism, either sitting on top of the main prediction/error hierarchy, or incorporated into the system and acting as a ‘virtual’ controller.Footnote 3 Either way, the system needs “to represent the known unknowns“(Hohwy 2012, p. 5) which determine the precision of the incoming error signals.

Recently it has been proposed that the role played by the optimization of expected precision in the PEM framework is equivalent to that of attention (Feldman and Friston 2010; Hohwy 2012; Brown and Friston 2013). According to this proposal, exogenous attention is present in the case of “stimuli with large spatial contrast and/or temporal contrast (abrupt onset)” (Hohwy 2012, p. 6), and is a result of the system’s expectation that strong error signals, i.e. ones standing out from the predicted noise level, will be more precise. Endogenous attention, on the other hand, is the result of a top-down process in which predictions about incoming stimuli increase signal gain in the relevant receptive fields. The function of endogenous attention is to decrease the uncertainty of future stimuli rather than reacting in response to unexpectedly strong incoming error signals (Hohwy 2012, p. 6). This type of attention can be illustrated using the example of visual search tasks, where the search is driven by a highly precise prediction of a particular stimulus that the subject needs to locate. According to PEM, this increases the gain on the receptive field sensitive to the object of the search. Once a pattern of retinal activations similar to the target is found, the high gain on the error signal pathways guarantees that only a hypothesis tailored to minimizing that very precise signal (corresponding to the object of the search) will be selected for driving the system’s behavior. This precision/attention mechanism will play an important role in our deflationary account of consciousness in the PEM model, and we will return to it later (in Sect. 4).

2.5 Conscious experience as the ‘winning hypothesis’

Finally, we arrive at the most popular proposal for how to explain consciousness within the PEM framework, namely that “conscious perception is determined by the prediction or hypothesis with the highest overall posterior probability—which is overall best at minimizing prediction error” (Hohwy 2012, p. 4).Footnote 4 To put it simply, what we perceive is the brain’s best guess about ‘what is out there’. This simple proposal is expanded upon by Hohwy (2012), who points out that by assuming this perspective, PEM ends up treating attention and perception as “two distinct, yet related aspects of the same prediction error minimization mechanism”, that together are responsible for determining “which contents are selected for conscious presentation” (Hohwy 2012, p. 1). At the core of this approach are two properties that control the dynamics of the bidirectional (i.e. top-down and bottom-up) exchange of information within the PEM system—accuracy and (expected) precision (Hohwy 2012, p. 5)—which together form the “statistical dimensions of conscious perception” (Hohwy 2012, p. 5), a state space within which conscious and unconscious perceptual states can be located and compared (see Fig. 1).

Fig. 1
figure 1

A diagram depicting the relationship between conscious and unconscious states with their accuracy (x-axis) and precision (y-axis) (from Hohwy 2013, p. 203)

Hypotheses with high accuracy can be understood as being better at representing the causal structure of the world, since accuracy can be taken to refer “to the inverse amplitude of prediction error per se” (Hohwy 2012, p. 5), representing the degree to which a hypothesis can account for the incoming sensory signal. However, as will become relevant in the later parts of this paper, Hohwy points out that this characterization involves a slight simplification, as “surprise has both accuracy and complexity components, such that minimizing surprise or free energy increases accuracy while minimizing complexity” (Hohwy 2012: ft. 2). What this means is that accurate hypotheses are not only good at minimizing prediction error, but can accomplish that by recruiting fewer resources (e.g. by making use of models with fewer parameters). As Hohwy explains: “This ensures the explanations for sensory input are parsimonious and will generalize to new situations […]” (Hohwy 2012: ft. 2).

Precise hypotheses, on the other hand, are those that are best at minimizing prediction error relative to the system’s prior expectations of the error signals’ variability. Thus, precise hypotheses are those that are best at explaining the error that is least random for the given prior distribution, i.e. they are best at ignoring noise and picking out the input that is relevant for a given context (Hohwy 2012, p. 4). Choosing the ‘best’ hypothesis involves making trade-offs between these two dimensions, as sometimes the most precise hypothesis might not be the one that is most accurate, and vice versa.

Although this proposal is centered around the basic assumption that conscious perception will typically belong within the space of high accuracy and precision (Fig. 1), Hohwy admits that this might not always be the case. It is possible that “a relatively inaccurate but precise model might determine conscious perception over a competing accurate but imprecise model, and vice versa” (Hohwy 2012, p. 5). The latter is what supposedly happens in gist perception, where subjects can perceive the general gist of a briefly presented scene, even though they are unaware of the details (see Bayne 2016 for a comprehensive overview). Conversely, the ‘ventriloquist effect’ (Jack and Thurlow 1973), involving a misperception of the location from which a sound comes, is an example in which our perceptual discrimination is highly precise and yet still inaccurate.

The role of attention in bringing predictions into consciousness is, therefore, twofold. Firstly, attention can raise a perceptual hypothesis into consciousness by adjusting precision weightings on either a strong incoming error signal (exogenous attention), or a signal expected to be highly reliable/salient in a given context (endogenous attention). Secondly, it can also serve as a kind of ‘bias control’ in cases of competition between hypotheses (Hohwy 2012, p. 7). Hohwy links this second role of attention to studies on the receptive fields of retinotopic neurons, which indicate that a single neuron’s activity is decreased “when two different objects are present in the neuron’s receptive field” (Hohwy 2012, p. 7). Following Feldman and Friston (2010), he proposes attention as the mechanism responsible for biasing neuronal responses between different inputs falling within the same receptive field, thus resolving the conflict and preventing further decrease of neuronal activity in that field. This role of attention is crucial for the PEM framework, as only one hypothesis can drive the system’s behavior in a particular task at any given time. This is consistent not only with the empirical findings within receptive fields studies (e.g. Robert Desimone’s famous research of biased competition, see Desimone and Duncan 1995; Desimone 1998), but also with research on perceptual phenomena like binocular rivalry (Dijkstra et al. 2016; O’Shea et al. 2013), in which neurotypical subjects experience alteration between percepts that are simultaneously presented in each of their visual hemifields. In the case of the latter phenomenon, PEM conceptualizes the two percepts (e.g. a house and a face) as distinct hypotheses generated by the system in order to try and predict the current state of the visual environment. The perceptual system is unable to resolve the rivalry between these competing hypotheses because both are equally probable given the sensory evidence, nor can it synthesize the two into a singular percept due to their incompatibility with prior expectations about the structure of the environment (i.e. that objects belonging to separate categories cannot occupy the same spatiotemporal location). This is why the competition is resolved sequentially over time, resulting in an alternating percept. Although the heuristic description provided here is quite simple, a computational model inspired by this approach has recently been validated with fMRI data collected by Weilnhammer et al. (2017), who showed that the PEM model outperformed other competing accounts in predicting increased responses from cortical areas implicated in controlling the rate of perceptual bistability.

To sum up, Hohwy’s proposal is that the status of a hypothesis as conscious or unconscious is determined by two separate dimensions: precision and accuracy (see Fig. 1). A hypothesis that is both precise and accurate will play a strong role in determining consciousness, whereas a hypothesis that is lacking in both dimensions will not, with other hypotheses falling somewhere along a spectrum from conscious to unconscious. One advantage of this proposal is that it can account for some variation in the quality of different conscious states: “inattentive but conscious states would cluster towards the lower right corner and attentive but unconscious states would cluster towards the upper left” (Hohwy 2012, p. 203). The proposal is also consistent with some recent findings that have been used to develop the PEM framework (such as the binocular rivalry studies described above), thus lending it a degree of empirical credibility.

3 Two problems with the ‘winning hypothesis’ approach

Our aim in this section is to explore the tacit assumptions behind the ‘winning hypothesis’ approach, revealing several conceptual problems that the framework is still facing. We saw earlier that the PEM framework assumes a direct connection between the content and properties of the generative model and the content and properties of conscious experience. It is relatively uncontroversial to assume that the content of perceptual experiences is a product of subpersonal processes, and that cognitive functions like attention are involved in determining which contents are made available for conscious presentation. However, the way in which numerous perceptual paradigms are framed in PEM explanations (see e.g. Clark 2013, 2016; Hohwy 2012 for an overview) implies that the framework carries a stronger implicit assumption about the relationship between the winning hypothesis (or state of the internal model) and the phenomenal aspects of conscious experience. Namely, most proponents of PEM seem to be committed to a form of representationalism about conscious experience.Footnote 5

Although there are many varieties of this philosophical position (see Chalmers 2004 and Lycan 2019 for comprehensive overviews), in its basic form representationalism about conscious experience is the thesis that the phenomenal aspect or properties of conscious experience are part of, or coincide with, the content which is represented by that experience.Footnote 6 Proponents of strong versions of the view (e.g. Dretske 1995; Tye 1995) postulate that representation of a certain kind is sufficient for phenomenal properties (”where the kind can be specified in functionalist or other familiar materialist terms”, Lycan 2019, p. 2.1) or that such properties reduce to representational ones (Kind 2007, p. 406). Weak representationalists (Block 1990, 1996; Chalmers 1996), on the other hand, are only committed to the view that phenomenal properties (of a certain kind or within a certain class) entail representational properties (of the same kind or class), but not vice versa, meaning that some further story is required about how phenomenal properties are determined.

A commitment to some form of this view is clear from the deployment by PEM proponents’ of the argument from transparency, which has long played an important role in the representationalists’ argumentative arsenal (cf. e.g. Moore 1903; Harman 1990; Tye 1995, 2003; Dretske 2003; Metzinger 2003, 2008). According to this argument, our perceptual experiences are transparent or diaphanous to us because our introspection into ‘what it is like’ to experience something does not reveal that we are in a perceptual state, while “the properties we are aware of in perception are attributed to the objects perceived” (Lycan 2019, Sect. 4.3). For example, when I see a red tomato, I experience the redness of the tomato as a property of the object which is the content of my conscious state, and not as a property of that conscious state itself. On the PEM side, Limanowski and Friston (2018) propose that this property of our experiential states “can metaphorically be understood as looking through a window onto the world, instead of looking at the window itself: we only access the representation’s intentional content (something in the world which it is about) without noticing its non-intentional carrier properties” (2018, p. 4), revealing a clear commitment to representational transparency.

Hohwy (2014, pp. 184–185) also embraces representationalism by arguing that PEM satisfies key requirements for a successful theory of phenomenal experience as laid out by Frank Jackson (2003). According to him, conscious representations in PEM are not only rich in detail due to an underlying “variance of percepts from the first-person perspective as represented in the fast time-scale causal regularities in the lower levels of the perceptual hierarchy” (Hohwy 2014, p. 184), but can even be said to be “inextricably rich” due to the “richly hierarchically structured perceptual inference (including binding, penetrability, and sensory integration); [as on PEM] there is no perceptual inference without a perceptual hierarchy to accommodate empirical Bayes.” (ibid.).Footnote 7

Yet, despite this, it is rather unclear which version of the view (strong or weak) Hohwy subscribes to. On one hand he states that PEM “seems to tick the boxes that Jackson and others have set out as requisites for a substantial representationalism about consciousness.” (Hohwy 2014, p. 184). While it is open to interpretation whether ‘substantial’ is meant as synonymous with ‘strong’, it seems quite clear that a strongly-representational position would be the best fit for PEM’s explanatory ambitions, as it would allow for an explanation of consciousness without relying on resources external to the framework. However, Hohwy undercuts this interpretation by also claiming that PEM is unlikely to solve the mystery of phenomenal experience and that the best it can hope for is to offer some insight into the nature of the mechanisms that bring about key aspects conscious perception (such as perceptual binding, illusions, and mental illness) and to help us capture “many of the things we care about in conscious experience” (Hohwy 2014, p. 208). Such statements side with a weakly representational view, but are inconsistent with his commitment to the framework’s explanatory power and ambitions.

In the remainder of this section we will investigate two problems emerging from these competing interpretations of PEM’s representational commitments, and show that either interpretation carries a significant cost for its proponents.

3.1 The problem of unconscious representation

Let us now return to the previous example of binocular rivalry. According to PEM, the subject might have the experience of a face, rather than a house, because the ‘face’ hypothesis is currently the winning one and the ‘house’ hypothesis is suppressed (Hohwy et al. 2008). Therefore the face hypothesis determines the content and phenomenal quality of the conscious state (at this time). However, the house hypothesis is also represented in the system at the same time. The default view about how hypotheses come to determine conscious experience, when paired with PEM’s commitment to representationalism about consciousness, seems to lead to a puzzle.

If PEM is committed to the strong view and phenomenal properties are identified with representational properties, then proponents of the framework need to spell out why the competing hypotheses are not both experienced phenomenally at the same time. After all, what changes in the process of switching are the probabilities assigned to such hypotheses, not their contents. A common answer to this kind of puzzle is that strong representationalism requires representations of a certain kind for content to be conscious (see e.g. Tye 1995). This may be helpful to PEM proponents in the short run, but eventually they will have to face a further question: what is it about the attribution of highest posterior probability that makes contents conscious?

Perhaps it is the difficulty of answering this second question that pushes Hohwy to sometimes interpret PEM as being committed only to weak representationalism (see e.g. Hohwy 2014, pp. 208–209). Moving in this direction helps to sidestep the question altogether, since weak representationalism does not bind one to the claim that phenomenal properties are representational properties. However, it forces a proponent of the framework into a sort of quietism in which they have to state that, even though representations with the highest posterior probability do determine conscious perception, PEM does not answer why or how such states contribute to the phenomenal aspect of experience.

Although Hohwy sometimes seems to side with the latter option, he also acknowledges that PEM does face the problem of answering whether representations that are suppressed, but nevertheless processed by the system, do come with phenomenal properties:

There remains the rather important and difficult question whether or not the unseen stimulus is in fact consciously perceived but not accessible for introspective report, or whether it is not consciously perceived at all; this question relates to the influential distinction between access consciousness and phenomenal consciousness (Block 1995, 2008). (Hohwy 2012, p. 7)

Here, he refers to the distinction between the phenomenal, qualitative aspect of a state and the availability of that state to introspection, report, and other forms of executive functions, i.e. its accessibility to consciousness. Ned Block (1995, 2008) and Victor Lamme (2006) have argued that all that is needed for a state to be imbued with phenomenal quality is that it is sustained over time by recurrent activity within a sensory cortex, even in the absence of any kind of cognitive access to that state (Block 2007). Howhy does not directly comment on this issue, but he does seem to side with Block and Lamme when he speculates that:

(i) access consciousness goes with active inference (i.e. minimizing surprise though agency, which requires making model parameters and states available to control systems), and (ii) phenomenal consciousness goes with perceptual inference (i.e. minimizing the bound on surprise by more passively updating model parameters and states). (Hohwy 2012, p. 7)

While the above proposal may solve the present problem (by effectively embracing the view that hypotheses that are represented in the PEM system but are inaccessible to the subject do come with phenomenal qualities), it ends up introducing additional confusion about the position held by Hohwy. Firstly, the proposal is ambiguous between embracing an unusually strong version of representationalism on which all that is needed for content to instantiate phenomenal properties is that it is rendered in the system via perceptual inference, and claiming that perceptual inference is somehow instrumental, but not sufficient, for the instantiation of phenomenal properties. Embracing the former position leads to a proposal which may not be coherent—how can one be in a phenomenally conscious state, and yet not be aware of it? How could I be in a qualitative state that ‘it is something like to experience’, yet not know ‘what that state is like’? Cohen and Dennett (2011) have criticized this position and pointed out that the very possibility of phenomenal experience in the absence of cognitive access is not verifiable in any way, since all empirical methods of investigating consciousness require the availability of some form of explicit or implicit report. Embracing the latter option—that only perceptual and not active inference plays a role in instantiating phenomenal properties—is less problematic, but it is significantly underdeveloped, and it is unclear whether such an interpretation is consistent with Block’s original phenomenal/access distinction (as Block and his followers associate phenomenal consciousness with particular stages of perceptual processing rather than kinds of processing). However, the most unwelcome consequence of embracing either solution is that neither of them can help us understand how the initial proposal that consciousness is determined by the hypothesis with the highest posterior probability can account for the well documented phenomenon of unconscious perception, to which we turn next.

3.2 The problem of unconscious perception

Our ability to respond above chance to stimuli that we are not consciously aware of is well documented in the literature. It seems that some perceptual signals are simply too weak to ever become conscious, but can nonetheless influence our behaviour. For example, subjects fail to detect subliminal primes even in controlled settings, but such stimuli can still drive behavioral responses (see e.g. Norman et al. 2014; Faivre et al. 2014; see also Ansorge et al. 2014 for a review). On the other hand, several studies have shown that percepts which were not consciously registered by the subjects can later be brought into consciousness by attentional cues directed at the location of the stimulus after its disappearance (Kentridge 2013; Sergent et al. 2013; Thibault et al. 2016). This is important, as it lends support to the PEM idea that, by modulating the probability of the bottom-up signals, attention determines how much influence their probability should have on the posterior. However, this also seems to pose a problem as it indicates that whether or not some content is available for conscious perception is not just a question of its relative probability, but also other factors (such as subsequent attentional cues).

Consider the cases of blindsight and visual form agnosia. The former occurs when patients who have suffered lesions to the primary visual cortex become ‘blind’ on one side of their visual field, but nonetheless retain some discriminatory ability on that side (see e.g. Humphrey 1974; Weiskrantz 1986; Holt 2003). The latter is a condition in which localized damage to the temporal lobes causes loss of the capacity to recognize shapes or objects, but leaves vision-guided motor coordination mostly unaffected (see e.g. Benson and Greenberg 1969; Milner et al. 1991; Carey et al. 1996). Patients suffering from these conditions have been shown to detect, localize, and act upon visual stimuli apparently without experiencing them consciously, which in turn has been proposed as evidence that perception can affect behavior in the absence of consciousness. For example, one particular subject suffering from visual form agnosia, known as D.F., has been shown to retain the ability to perform simple visuo-motor tasks (like grabbing a cup or posting a letter into a letterbox) nearly or even equally well as healthy controls, despite not being able to discriminate the shapes of objects involved in the tasks (i.e. performing at chance in shape or object discrimination tasks).

When looking at such cases from the perspective of PEM, it seems that the visual system does form distinct, albeit imprecise, hypotheses about the objects present in the portion of the visual field affected by the cortical damage. Since such hypotheses are able to affect behavioral responses, they must at some point be assigned the highest posterior probability, i.e. be the ‘winning hypothesis’ on some level of the hierarchy. Yet they are not experienced in consciousness, and so posterior probability alone cannot be the only property than determines whether a hypothesis is selected for conscious presentation.

Squaring such evidence with Hohwy’s earlier stipulation that phenomenal consciousness is associated with perceptual inference is also difficult. It seems that, by subscribing to Block’s distinction, Hohwy’s original proposal that consciousness is determined by the hypothesis with the highest posterior probability becomes severely limited in scope. Embracing Block’s distinction means that the winning hypothesis view is only an account of access consciousness and not phenomenal consciousness. However, even that might not work once we consider visual form agnosia, which implies that sometimes unconscious states do take control of the system’s behavioral responses, in the same way that conscious states can, but without any (reported) phenomenal experience. Effectively this means that, at least in some cases, hypotheses can be ascribed the highest posterior probability while remaining unconscious, which in turn undermines the idea that the ‘winning hypothesis’ view offers a complete account of the functional aspect of consciousness.Footnote 8

Hohwy anticipates this problem by pointing out that the ‘winning hypothesis’ view does not necessarily apply to hypotheses on a single level of the hierarchy, but rather should be understood as targeting hypotheses that successfully minimize prediction error across multiple levels:

When a specific model is the one determining the consciously perceived content it is just because it best minimizes prediction error across most levels of the cortical hierarchy—it best represents the world given all the evidence and the widest possible context. This is the model that should be used to selectively sample the world to minimize surprise in active inference. Competing but less probable models cannot simultaneously determine the target of active inference: the models would be at cross-purposes such that the system would predict more surprise than if it relies on one model alone (for more on the relation between attention and action, see Wu 2011). (Hohwy 2012)

According to Hohwy, then, there is no specific threshold for consciousness, but his answer to the question implies that whether a hypothesis is accessible depends on the landscape of probabilities in which it competes. The reason why a relatively inaccurate and imprecise hypothesis can sometimes be the one driving conscious experience is because all the competing models are ‘worse’. The winning hypothesis, simply put, merely needs to be better than the competition. This move pushes back on our accusation that the ‘winning hypothesis’ cannot account for the functional aspect of conscious experience, while embracing the proposal that it just is an account of access consciousness.

All of the above seems to support the weakly representational interpretation offered at the end of Sect. 3.1. While this may be a satisfying conclusion for many, and the position certainly has the virtue of being restrained, we would like to (once again) stress that, at least at face value, it is inconsistent with PEM’s ambition to explain all facets of our mental lives (Hohwy 2013, p. 1). Embracing such a position, therefore, is an admittance of PEM’s limited explanatory power and inability to deliver on its promises. In the next section we will argue that this view be strengthened by rejecting Block’s access/phenomenal distinction, which would enable it to offer an answer to questions about the nature of phenomenal consciousness (albeit a deflationary one), and therefore rescue some of PEM’s broad explanatory ambition.

4 A deflationary approach to PEM and consciousness

Having presented an overview of PEM and assessed one popular proposal for how it might account for consciousness, we will now turn to our own alternative position. Our approach has much in common with Hohwy’s position, insofar as it retains an important role for attention in determining conscious content, but it moves away from the idea that high probability alone is sufficient for content to be conscious.Footnote 9 Our proposal is that in order to (potentially) enter consciousness, a hypothesis must be ‘probe-able’, a feature that depends partly (but not entirely) on its probability. As we will explain in more detail shortly, the Dennettian concept of a ‘probe’ that determines consciousness fits nicely with the concept of active inference and the role of attention in PEM, and so we will propose that the PEM mechanism responsible for controlling attention can serve as an implementation of probing. By adopting this approach, we will reframe Hohwy’s original proposal as a self-standing approach to consciousness and resolve the problems it faces.

Before presenting our proposal in more detail, we will first outline some important philosophical assumptions that underwrite this approach. Our aim here is not to present a defense of these assumptions, but rather to clearly state where we are coming from, so that any disagreement will (hopefully) be made more perspicuous. We will then explain how we think this approach helps us to resolve (or dissolve) the two problems outlined above, before moving on in the final section to consider some further outstanding issues.

4.1 The method of heterophenomenology

There is widespread agreement that any serious empirical investigation into conscious experience must take subjective verbal reports as a genuine datum (see e.g. Piccinini 2003; Chalmers 2013). To explain consciousness we need to explain why we experience and report our experiences as being this way rather than another (Chalmers 2018). However, this basic intuition clashes with the fact that first person reports about subjective experience, while a valuable source of data about perceptual processing, are often considered to be unreliable due to the lack of any possibility for verification (Dennett 1991; Schwitzgebel 2007, 2012b). Medical conditions like visual anosognosia, in which subjects are unaware that they are blind (Prigatano and Schacter 1991); visual illusions, like the color phi phenomenon (Kolers and von Grünau 1976); and the fact that subjects systematically overestimate the precision and richness of visual information outside of the foveal area (Dennett 1991) and outside of the scope of attention (Knotts et al. 2018) all indicate the general unreliability of our introspective abilities (see e.g. Schwitzgebel 2007, 2012a, b for further discussion).

Dennett’s ‘heterophenomenological’ methodology (1991, 2003, 2006) is aimed at taking both of these points seriously. In contrast to the phenomenological tradition (see Smith 2018) and introspection based psychology (e.g. Titchener 1901), heterophenomenology does not consider subjective self-reports as ultimately authoritative about mental phenomena, but rather treats them as non-committal fictions, which carry information about how things seem or are judged to be by the subjects, rather than how they really are. In this regard heterophenomenology is claimed to be similar to anthropology, as it suspends judgement about the reality of the entities and properties of subject’s reports, in a similar way to how an anthropologist studying some peoples’ religious system does not pass judgement on the existence of deities and other entities that are the object of those peoples’ beliefs. As Dennett points out, this way of doing science of the mind “is nothing new; it is nothing other than the method that has been used by psychophysicists, cognitive psychologists, clinical neuropsychologists, and just about everybody who has ever purported to study human consciousness in a serious, scientific way” (2003, p. 22).

Associated with this proposal is the rejection of the idea of phenomenal properties (or ‘qualia’) as ineffable and intrinsic properties of experiential states, to which conscious subjects have immediate (and privileged) epistemic access (cf. Dennett 1988). Since properties present in experience can be in some sense illusory (e.g. peripheral vision appears as rich and detailed even though it is not, colors appear to be the intrinsic properties of objects even though they are bundles of complex relational properties; cf. Cohen 2009), consciousness researchers should not aim to explain the existence of such properties themselves, but should rather explain why conscious experience appears to have such properties (Frankish 2016a, b, cf. Chalmers 2018), i.e. why and how subjects come to believe that their experiences have such properties.

As mentioned before, our approach in the remainder of this paper will be fundamentally deflationary, in just the way implied by heterophenomenology. We take the strong representational proposal that phenomenal properties are identical to representational properties quite seriously, and the solution to the problems posed in the previous section will simply require an explanation for why some hypothesis (instead of another) determines the content of a verbal report about conscious experience, rather than determining the phenomenal properties of that experience as such. Following Dennett, we think that taking this course of action will result in rendering the problematic or mysterious character of phenomenal properties as nothing more than an adaptive illusion. Of course, in line with the outlined method, and in order to fully amend Hohwy’s account, we will also need to say why or how such an illusion comes about. With these preliminary remarks out of the way, we will now move on to our main positive proposal.

4.2 Multiple-drafts model and the ‘fame in the brain’ metaphor

The PEM account of consciousness developed by Hohwy (and others) bears some interesting similarities to Dennett’s multiple drafts (or ‘fame in the brain’) approach, which we think positions it well to adopt the further innovation of a ‘probe’ that determines conscious content. According to Dennett’s model of consciousness there are multiple, parallel streams of information that are being processed in the brain at any given time. Conscious experience is determined by a ‘probe’ that, through a process of eliciting reports (verbal or otherwise) from a subject, takes one of the competing information streams as a more-less complete story about what is going on in the mind at that time (1991, p. 135). Probing may be verbal, environmental, or self-induced; what matters is that the subject is able to respond to the probe in an appropriate way, i.e. by producing an internal or external indication of their apparent conscious experience (e.g. a silent or vocalized report, or the production of some behavior that is consistent with the apparent conscious experience). In this context a ‘probe’ is a kind of higher-order, meta-cognitive state which takes another perceptual or cognitive state as its object. Prior to being probed in this way, there is no determinate fact of the matter about what the subject is conscious of, but rather a stream of possible ‘drafts’, each of which is potentially available to determine the content of a report if it is probed in the right way (we could say that a draft of this kind is ‘probe-able’). As Dennett puts it, there are “multiple ‘drafts’ of narrative fragments at various stages of editing in various places in the brain. Probing this stream at different places and times produces different effects, precipitates different narratives from the subject” (Dennett 1991, p. 113).

Although it is impossible to pinpoint which contents will be selected for consciousness prior to a report, any process can have more or less ‘fame’ or ‘clout’ (Dennett 2005, pp. 136–138), i.e. be more or less likely to feature in a subject’s report, in virtue of the functional role it plays in global information processing in the brain. Thus, the question of which states and processes ‘become conscious’ boils down to a distinct question of which of many possible ‘drafts’ (i.e. information streams) is the one exerting the most influence on the system at the time of the probe, and thus which draft becomes available for report (verbal or otherwise). While it may seem counterintuitive, this means that, on Dennett’s conception, there is actually no fact of the matter about what a system is conscious of until it is probed (although it is important to note that a probe may be either internally or externally generated, meaning that a system might become conscious ‘spontaneously’, at least as it appears from the outside). Dennett’s metaphor of ‘fame in the brain’ is meant to refer to how likely a state is to be in control of the system at the time it is probed, and thus how likely it is to determine the content of consciousness at that moment.

While we do not wish to subscribe to the claim that the contents of unconscious states are underdetermined prior to probing (according to PEM, there is a fact of the matter about which hypothesis is currently ascribed the highest posterior probability, even if its contents are inaccessible to the subject), we do think that the notion of ‘fame in the brain’ is extremely similar to the idea of the winning hypothesis being responsible for determining the contents of conscious experience (described in Sect. 2.5), while the notion of probing can be treated as equivalent to the way that PEM describes attention and active-inference (Sects. 2.3 and 2.4). We develop this proposal in more detail below.

4.3 Probe-ability determines conscious content

Probing, we want to suggest, can be cashed out in terms of the PEM notion of attention. A probe can be considered a higher-order process of predicting the states of the system in accordance with the expected precision of the incoming error signals.Footnote 10 As we have already suggested in Sect. 2.4, for attention to be successfully deployed the system needs to draw on stored information about the context in which the prediction errors are generated. In other words, it needs to draw on the multi-level representations which shaped the hypotheses that generated the erroneous predictions, in order to assess the reliability of the errors related to said predictions. Thus, the successful deployment and control of attention on a given level of the hierarchy will require the system to issue descending predictions about the expected states on that level of the hierarchy, in order to predict what precision the errors on a given level should have. Effectively, the system needs to be predicting its own states in order to deploy attention, and these top-down meta-predictions, we want to suggest, constitute what Dennett calls a probe.Footnote 11

Recall that, according to PEM, attention is the allocation of expected precision based on the relative spread of the sensory samples. There are two distinct ways in which attention can influence the attenuation of prediction error. Exogenous attention operates by raising the precision of predictions that can account for sudden or especially pronounced error signals, increasing the strength of the hypothesis that generates those predictions and thus making it more likely that that hypothesis will come to dominate the hierarchy. This can be likened to the way in which a sudden, unexpected event can act as a probe, generating a conscious report from a system that was previously operating ‘on autopilot’, so to speak. We can illustrate this with the example of driving down a familiar street, not really paying any (conscious) attention to one’s surroundings, when another car pulls out in front of you and forces you to suddenly slam on the brakes. Even though prior to this event there may not have been any fact of the matter about what one was ‘conscious’ of, immediately afterwards you will have a seemingly conscious memory of what was happening prior to the car pulling out (cf. Dennett 1991, 137). From the PEM perspective, this seems to be a clear case of exogenous attention determining the content of conscious experience by raising previously unconscious hypotheses into awareness, in a way that fits very neatly with the Dennettian story about probing.

Endogenous attention, on the other hand, operates in a top-down fashion, where the system raises the expected precision of predictions matching the object of attention, thereby increasing the salience of sensory inputs that confirm these predictions. This type of attention can be illustrated using the example of Where’s Waldo? books. The premise of these books is that the reader has to find the character Waldo, who is hidden in complex, confusing images filled with human and animal figures. The search is driven by a top-down prediction of a particular stimulus, Waldo, appearing against the noisy background. Thus, this process relies on the mechanism of endogenous attention—the gain on the receptive field sensitive to the object of the search is increased. Notice that this requires the attentional control system to form a high-level hypothesis about the target of the search. Once a pattern of retinal activations corresponding to those possibly caused by the target is found, the high gain on the error signal pathways guarantees that only a hypothesis tailored to minimize that very precise signal (congruent with the ‘Waldo hypothesis’) will be selected for driving the system’s behavior. In this case, attentional modulation comes from within the system itself, highlighting the draft (a hypothesis or a collection of hypotheses on multiple levels) that corresponds to the behavior of scanning the page looking for Waldo. Thus, the system will be primed to report a conscious experience with this content, though importantly other ‘shorter’ drafts can also capture a system’s resources in an exogenous manner during visual scanning.Footnote 12

Both kinds of attention contribute to determining the winning hypothesis and thus, according to Hohwy, the content of conscious experience. Here the concept of a ‘winning hypothesis’ can be likened to Dennett’s metaphor of ‘fame in the brain’—the hypothesis that currently enjoys the greatest control of the system is the one that, when probed, determines the content of experiential reports. Our proposal is that a winning hypothesis (or, in the case of a more complex stimulus/situation, a hierarchically organised bundle of hypotheses) becomes more probe-able the greater its probability. Note, however, that on this view it is not sufficient for a process to be highly probable for it to be conscious, rather the system needs to be actively probing the model—in other words, it needs to be performing active inference on its own states. It is this additional feature, coupled with the Dennettian deflationism about the phenomenal status of a draft (or hypothesis) pre-probe, that gives us the tools to resolve the two problems introduced in the previous section.

A possible solution to what we previously called the problem of unconscious perception should now be visible. Although phenomenal properties simply are representational properties, the difference between conscious and unconscious hypotheses lies in which of these states the system is currently conducting active inference on. This means that, in cases like binocular rivalry, there may be multiple candidate hypotheses, each of which is sufficiently high probability to be ‘probe-able’, but none of which will determine the content of consciousness until the moment that one is actually probed. The attention mechanism described above provides a simple account of how this process of probing can take place: attending to a hypothesis heightens the precision (and probability) of that hypothesis, boosting it to a level where it is able to (temporarily) take control of the system and thus elicit a verbal report (or some other indication of consciousness according to the heterophenomenological method). In the case of exogenous attention this is triggered by an external factor, such as a loud noise or sudden change in the visual field, whereas in the endogenous case it is triggered internally, and is the result of the system predicting how its lower-level internal states should be in order to minimize long term prediction error.Footnote 13

This approach can also help explain why an antecedently ‘improbable’ hypothesis (low accuracy, low precision, or some combination of the two) can sometimes nonetheless enter consciousness: the very act of probing serves to focus attention on that hypothesis, heightening its precision and thus boosting its ‘fame’. A clear example of this kind of situation is when a sudden, surprising event focuses our attention on a previously unlikely hypothesis, such as a loud noise that we immediately become aware of. Similarly, an exogenous or endogenous probe can make salient a previously unconscious hypothesis, bringing ‘into’ consciousness the events of the last few moments, and making it seem as though we were always aware of them. Classic examples of this phenomenon include focusing on the last few chimes of a clock that one was previously ignoring, but nonetheless being able to accurately determine how many chimes there were (an endogenous probe), or being jolted into awareness by a surprising event while driving along a familiar route on ‘autopilot’, but nonetheless retaining a clear sense of having been aware of driving prior to the event (an exogenous probe). In both cases the Dennettian approach would argue that the question of whether or not one was conscious ‘all along’ simply doesn’t make sense, as prior to a probe there is just no fact of the matter about what one was conscious of, but rather a hierarchy of potentially probe-able drafts (or hypotheses) that are only selected from at the moment of probing. This approach, we think, fits very neatly into the existing PEM story about the determination of consciousness, helping to make sense of some otherwise puzzling features without having to posit any additional tools or mechanism (as hypotheses take the place of drafts, and probing is naturally identified with the deployment of the attention mechanism). It also sheds useful light on Dennett’s more recent work, in which he has defended attention as a resource which can be distributed among many perceptually salient targets at the same time (Cohen and Dennett 2011). This fits nicely with our proposal, as the conception of probing we described above allows for many ‘famous’ hypotheses located on different levels to be simultaneously probed by levels higher up the hierarchy.

Finally, the approach sketched here can give an account of unconscious perception without falling into the same issues as the winning hypothesis account. It is not surprising that patients suffering from partial loss of vision, as in the case of blindsight, can perform certain visual tasks in the absence of consciousness. Such subjects do form hypotheses that are functionally capable of driving their responses on relevant tasks, however since most cases of unconscious perception are caused by serious pathologies resulting from structural damage, such hypotheses are likely inaccessible for probing and precision adjustment by the rest of the system. This may also explain why, despite performing adequately on relevant tasks, subjects suffering from such deficiencies have a very limited capacity for forming subjective assessments of their own reliability.

4.4 Cleaning up the phenomenal residue

So far we have applied Dennett’s multiple drafts model to PEM and shown that, on our view, what differentiates conscious from unconscious representations is not just their posterior probability, but also the fact that they are the target of a system actively generating meta-predictions about their optimal state and precision. Given our heterophenomenological assumptions, this is enough to deal with the problems incurred by unconscious perception and representation. However, critics of the deflationary approach may point out that all we have done is reframe Hohwy’s own view about active inference being linked to the winning hypothesis—an account of access consciousness—without really addressing the problem of phenomenal consciousness. While we do not hope to completely defuse this worry, we do think that our approach can help to address at least two distinct questions that deflationary approaches sometimes face. The first question has to do with why certain worldly properties, e.g. electromagnetic waves of a certain length, are represented by contents with a distinct qualitative character, e.g. redness, while others, e.g. the number two, are not. The second question is about accommodating our intuitions that there is something more to subjective properties than them just being representations, i.e. that there is something ‘special’ or ‘mysterious’ about phenomenal consciousness, or what Chalmers (2018) has recently described as “the meta-problem of consciousness”.

The first question is largely an empirical one. For example, finding out why the human visual system is attuned to track certain physical quantities is largely a matter of exploring the morphological and ecological constraints that shaped our evolutionary history. However, the issue that typically troubles philosophers more is answering the second question: why do certain contents come with a specific subjective feel—i.e. why does that particular combination of physical properties cause such a sensation of redness, when it could have very well caused a sensation of greenness? Again, the exact nomological relation between, say, the space of possible physical quantities responsible for color perception and the space of phenomenal color sensations will likely depend on empirical facts about human vision. However, PEM can offer some insights into why certain clusters of quantitative physical properties might be represented by seemingly very different qualitative properties, i.e. why it is that quantitative properties can come to seem ‘special’ or ‘mysterious’.

Recall (from Sect. 2.5) that the successful minimization of prediction error involves minimizing surprise or free energy by increasing the accuracy of hypotheses, while simultaneously minimizing their complexity. What this means is that the PEM system is constantly making tradeoffs between speed and efficiency on the one hand, and devoting costly resources to fine-tune its representations of the world on the other. Faced with finite resources and external constraints, it is not surprising that a PEM brain might settle on furnishing internal models with only as much detail as is necessary for achieving a given task, compressing complex webs of interdependent physical causes into frugal representations that optimize not for veridicality, but rather for processing speed and efficiency. One result of this frugal representational format might be that our conscious states come to seem (to us) to be mysterious, as they are relatively lacking in detail (cf. Dołęga and Dewhurst 2019).

Combining PEM’s propensity to favor frugal representations with our proposal regarding probing can also shed further light on our intuitions about the mysterious or ineffable quality of consciousness.Footnote 14 When we take into account the hierarchical nature of the PEM cognitive architecture, and the fact that probing is essentially a metacognitive capacity involving predicting the optimal state of the system’s models by drawing on prior information about the world and the context the agent is in, it turns to be relatively unsurprising that probing frugal models may lead to somewhat inaccurate inferences about one’s own capacities. The successful precision optimization of frugal perceptual models will require drawing on available contextual information about how such models operate, and using it to form predictions about how to optimize information processing. Thus, the successful deployment of endogenous attention, say to a particular perceptual process, will require forming a representation that is ‘twice-removed’ from the world—a facsimile of a facsimile of how the world is. Such a meta-representation will itself be optimized for complexity in the same way as its target representations are. This process of nesting content within content may be what is responsible for our intuition that there is more to the contents of our perception than just representational properties. However, this is just a ‘user illusion’ (Dennett 1991) caused by our cognitive system’s adaptive capacity for efficiency over accuracy. Thus, properties of and present in our experience only seem mysterious because of the way in which they come to be represented by the PEM system.

5 Conclusion

We have argued that Hohwy’s proposal for explaining consciousness within the PEM framework, while promising, suffers from two outstanding problems. Firstly, it seems to suggest the existence of phenomenal-yet-unconscious states, as there can be cases where two or more functionally equivalent states fulfil all the criteria for consciousness, but only one is experienced by the subject as being conscious. Secondly, and relatedly, it seems that Hohwy’s approach cannot adequately account for cases of unconscious perception, such as blindsight. We then suggested that both problems can be resolved by adopting a deflationary, Dennettian approach to the determination of consciousness, where multiple hypotheses can be simultaneously ‘probe-able’, but do not become conscious until they are actually probed, eliciting either a verbal or behavioural report. This approach is well-suited to the PEM framework, as both draw on similar bodies of empirical research for support (e.g. biased competition), and also share other structural similarities, such as the role played by attention (or probing) in determining the content of consciousness (or a report thereof). Finally, it seems to us that in order to properly accommodate consciousness and allow further empirical investigations (e.g. into its relation with other leading models of consciousness in neuroscience, such as the global neuronal workspace, e.g. Baars 1997; Dehaene and Naccache 2001), PEM will have to adopt a somewhat deflationary approach to consciousness, yielding similar results to the position advocated in this paper (i.e. positions similar to those occupied by Carruthers 2000 or Lau 2008), as failing to do this would mean that the framework will likely fail to deliver on its explanatory ambitions.