Introduction

Research on perception without awareness relies primarily on the dissociation paradigm, which compares two types of measurement (Reingold & Merikle, 1988). Indirect measures are used as indicators that a critical stimulus has been processed in the first place (e.g., a masked prime or a binocularly suppressed image). Typical indicators are priming effects in response times. Direct measures are supposed to measure visual awareness for the critical stimulus that provoked the indirect effect. Typical of such measures are discrimination accuracy or visibility ratings.Footnote 1

Historically, most researchers have aimed for a simple dissociation between direct and indirect measures, which is observed when the indirect measure shows a clear nonzero effect while the direct measure indicates null sensitivity (Schmidt & Vorberg, 2006). Time and again, the dissociation paradigm has been attacked for seldom if ever demonstrating a simple dissociation convincingly (Eriksen, 1960; Holender, 1986; Meyen et al., 2020), even though rather convincing demonstrations of simple dissociations exist (e.g., Norman, Akins, Heywood, & Kentridge, 2014; Schmidt & Schmidt, 2010; Vorberg et al., 2003). In response to this problem, a minority of papers have aimed for a double dissociation pattern (Albrecht et al., 2010; Biafora & Schmidt, 2020; Lau & Passingham, 2007; Maniscalco, Peters, & Lau, 2016; Mattler, 2003; Merikle & Joordens, 1997; Vorberg et al., 2003). A double dissociation occurs when an experimental manipulation leads to an increase in performance in the indirect measure but a decrease in performance in the direct measure, or vice versa: for instance, an increase in priming effects over experimental conditions accompanied by a decrease in discrimination accuracy for the prime (Vorberg et al., 2003). Double dissociations are more powerful than simple ones because they do not require null sensitivity in the direct measure while also operating under milder measurement assumptions. They indicate that direct and indirect measures cannot both be monotonic functions of a single source of conscious information (Schmidt & Vorberg, 2006; Schmidt, 2007).

The way awareness of the critical stimulus should be measured is a matter of debate. Two types of measures can be distinguished (Seth et al., 2008). Objective measures are responses to the critical stimulus that can be compared with the actual stimulus characteristics (e.g., its color or shape), and are therefore classifiable as correct or incorrect (e.g., yes-no detection or discrimination; two-alternative forced choice; recognition; identification). Subjective measures are reports of an internal state that cannot be validated externally (e.g., ratings of stimulus lightness, clarity of impression, or confidence in correct identification; Cheesman & Merikle, 1984, 1986; Reingold, 2004). The distinction between subjective and objective measures thus refers to the task mode rather than the perceptual content of the measure. Several authors argue that subjective and objective measures can be equally sensitive because they found that when participants report that subjective visibility is absent, their performance on an objective discrimination task was also at chance (e.g., Avneon & Lamy, 2018; Lamy et al., 2015, 2017; Peremen & Lamy, 2014; Ramsøy & Overgaard, 2004). Other authors, however, have found marked differences in the data patterns from objective and subjective measures (e.g., Biafora & Schmidt, submitted; de Graaf et al., 2012; Jannati & DiLollo, 2012; Koster et al., 2020; Lau & Passingham, 2007).

In psychophysical procedures, objective and subjective measures are often used jointly, for example, when constructing a receiver-operating characteristic (ROC) that plots objective hit and false alarm rates as a function of subjective confidence ratings. Signal detection theory (SDT; Green & Swets, 1966; Macmillan & Creelman, 2005) can be viewed as a model of the subjective experience of a stimulus when it is present or absent in unavoidable noise; it gives rise to objective performance if the observer applies a criterion to the subjective evidence that leads to discriminatory behavior. SDT thus gives room to subjective influences when separating sensitivity from response bias, and so do threshold-based models like the double high-threshold model (Malejka & Bröder, 2019). Nevertheless, some authors advocate the exclusive use of subjective measures, while others advocate the opposite, despite the close connection of the two in psychophysical theory.

Itinerary for this paper

The purpose of this paper is to clarify the roles of indirect, objective, and subjective measures in the dissociation paradigm and draw conclusions for theories of visual awareness. We start by introducing the idea of a critical cue, the perceptual counterpart to the physical stimulus feature that generates the indirect effect, and argue that the critical cue constitutes an indispensable basis for any dissociation between direct and indirect measures. After showing that different direct measures can undergo surprising dissociations amongst each other, we extend an important idea in psychophysical research: the concept of criterion content (Kahneman, 1968) as consisting of a set of perceptual cues. This is the starting point for our cue set theory (CST) of visibility measures in the dissociation paradigm. We explain how cues must be integrated to form measures of awareness, and use a simple mathematical formalization to redefine the concepts of exclusiveness, exhaustive validity, and exhaustive reliability of awareness measures. Next, we take some time to study different patterns of overlap in the criterion contents of objective and subjective measures, as well as their possible containment of the critical cue, and show that neither class of measures can generally replace the other without sacrificing crucial information. After briefly discussing the validity of the popular Perceptual Awareness Scale (PAS), we evaluate claims about measurement properties that are frequently evoked in the literature. In the final part of the paper, we return to the empirical fact that different measures of awareness of the same stimulus may undergo double dissociations among each other, meaning that one measure increases over experimental conditions while another decreases. We use Schmidt and Vorberg's (2006) mathematical methods to prove three propositions:

  1. 1)

    "No single source for double dissociations": double dissociations among direct measures imply that they cannot all depend monotonically on the same single source of conscious information.

  2. 2)

    "No simple theory of double dissociations": no theory that explains awareness in terms of a single monotonic process can explain double dissociations between direct measures, and

  3. 3)

    "Explaining dissociated gradients": any theory of awareness needs to explain the variation in the entire multivariate set of awareness measures, especially if there are dissociations among them.

We end the paper with a proposal to advance a more modest view of visual awareness and unconscious perception: Instead of trying for sweeping theories of "consciousness," we advocate studying task dissociations not only between direct and indirect measures, but also among different direct measures, and to build theories that are sufficiently specific to address the many differences between all those facets of conscious and unconscious vision. Without loss of generality, we focus on the domain of visual perception, but note that our theory can be extended to other sense modalities as well as to fields like implicit memory and learning, implicit decision making, and others.

A few words about the role and purpose of mathematical formalization in this paper. Our use of elementary mathematics (limited to basic set theory and the simple algebraic concept of monotonicity of functions) is not intended to flabbergast readers with complicated expressions for ideas that are already commonplace in consciousness science. Rather, we are trying to pinpoint those ideas by transforming them into clearly defined concepts that are specific enough to carry a mathematical proof. Doing this has three important advantages. First, a more explicit formulation reveals the scopes and limits of those concepts and how they are related to each other. Second, it gives critical readers the chance to examine the exact assumptions underlying our arguments, to attack our basic tenets by questioning the assumptions, and to arrive at new tenets by using alternative assumptions. Third, it helps prevent using important methodological concepts in a fuzzy, metaphorical way.

Criterion content and the critical feature

The concept of criterion content was introduced by Daniel Kahneman in his studies on metacontrast, a form of visual backward masking (Breitmeyer & Öğmen, 2006). In his famous review paper (Kahneman, 1968), he argues that participants in psychophysical experiments may use sources of information quite different from what the researchers expect (also see Hake et al., 1967, for an early quantitative approach to this problem). For instance, when asked to discriminate whether a masked prime is a square or a diamond, a participant may develop a strategy to monitor a particular spot on the screen, inferring that the prime was a square whenever she detects a flicker in that spot. That participant may successfully perform the task without ever consciously seeing the prime's shape: Her criterion content is based on flicker at a specific location, not on perceived shape. Kahneman stresses that to examine an observer’s criterion content, it is necessary to consider the phenomenology of the observations: “[…] a fuller description of the code that the subject uses in mapping his private experience onto responses to the experimenter’s questions” (Kahneman, 1968, p. 410). Let's examine this concept a little further.

Figure 1 shows the outline of a hypothetical response priming experiment where a prime is followed by a target at various stimulus-onset asynchronies. Because the inner contours of the target are adjacent to the prime contours, the target also serves as a metacontrast mask of the prime and can strongly reduce its visibility (provided that the colors are sufficiently desaturated; Schmidt, 2000). Primes and targets are red squares, green squares, red diamonds, and green diamonds. In two target identification tasks (TIDs; performed in different sessions), participants give a speeded response either to the shape of the target (shape TID) or to its color (color TID). In shape TID, the shape of the prime will activate the correct or incorrect response, resulting in longer response times when prime shape and target shape are inconsistent than when they are consistent. This priming effect is our indirect measure indicating processing of prime shape. In color TID, it is the color that will prime responses to the target, and that priming effect is an indirect measure indicating processing of prime color. Previous studies show that in two-dimensional stimuli and separate TID tasks like this, it is only the task-relevant feature that primes the response while the task-irrelevant feature does not affect response times, even though the stimulus material is identical in both tasks (e.g., Heinecke, 2000; Seydell-Greenwald & Schmidt, 2012; Tapia et al., 2010). What is important here is that even though both tasks use identical stimuli, the critical feature is different for each task. We define the critical feature as the physical stimulus distinction that drives the indirect effect – the difference between square and diamond primes in shape TID, and the difference between red and green primes in color TID. In other words, the critical feature is always implied by the processing requirements of the indirect task. This is crucial for the logic of the dissociation paradigm: Any dissociation between an indirect and a direct measure (be it objective or subjective) is only meaningful when the direct task measures awareness of the critical feature – otherwise there is a mismatch between the tasks (D-I mismatch; Schmidt & Vorberg, 2006). In our example experiment, shape TID thus requires a direct task asking for shape, and color TID requires a direct task asking for color. In contrast, a detection instead of a discrimination task would fail to match either indirect task because the priming effect is driven by the shape or color of the prime, not its presence or absence (Reingold & Merikle, 1988).

Fig. 1
figure 1

A hypothetical experiment. (a, b) Primes and targets differ in two dimensions, color or shape, and the target serves to mask the prime by metacontrast. (c) In two target identification tasks, participants respond either to the shape or to the color of the target, and priming effects from different prime types serve as indirect measures of shape or color processing, respectively. But how should we measure the visibility of the primes?

Both objective and subjective tasks can be used to measure awareness of the critical feature. An objective measure could directly ask the observer to indicate whether the prime was, for instance, red or green. A subjective measure could ask, "Rate the clarity with which you perceived the color of the prime." Both questions clearly address the critical feature, but only the objective one explicitly asks about its identity and can be compared with the actual stimulus.

How do we know what the critical feature is? When stimuli are as simple as in our example, the critical feature is usually well defined and unequivocal. With more complex stimuli, this is not always the case, and it may become an empirical question which stimulus aspects are actually generating the indirect effect. Consider the problem of demonstrating affective priming by schematic face stimuli (smiley and frowny faces; e.g., Fenske & Eastwood, 2003; but compare Schmidt & Schmidt, 2013). In that field, researchers usually assume that the priming effect is based on an affective response to the prime or at least on its semantic processing, but Horstmann et al. (2006) argue that it is driven primarily by low-level visual features in the stimuli. In such a situation, a direct task asking for affective evaluation of the prime would actually be a mismatch to the indirect task. A direct task directly asking for the presence of low-level visual features, on the other hand, might be a better match. Unfortunately, it would no longer be addressing the original research idea because the indirect task is invalid to begin with (confounded by low-level features).

Dissociations among multiple direct measures

The classical dissociation paradigm is usually discussed in terms of one indirect and one direct variable. However, when several direct measures are employed in the same experiment, surprising dissociations can occur among them. Lau and Passingham (2007) used masked squares and diamonds under metacontrast masking and compared an objective direct measure (percentage of correct discriminations) with a subjective one (percentage of "seen" ratings). They showed that subjective ratings could still differ when objective performance was equated. In the same vein, Sackur (2013) showed participants pairs of metacontrast events at different target-mask SOAs and asked them to rate their subjective similarity. He then used multidimensional scaling to argue that even if two metacontrast conditions lead to the same objective discrimination performance, their subjective appearance can still differ. Vorberg et al. (2003) presented participants with arrow primes masked by metacontrast and showed that while the ability to detect the prime increased with prime-mask SOA, the ability to discriminate the prime's pointing direction (which was the critical feature that generated the priming effect in the indirect companion task) remained at chance.

Most recently, Koster et al (2020) further explored the possibilities of employing multiple direct measures. They presented square or diamond-shaped primes (shown for 24 ms) that were followed by square- or diamond-shaped masks (shown for 108 ms). The prime-mask SOA was varied parametrically, ranging from 24 to 84 ms (much like our example experiment shown in Fig. 1, but with black shape stimuli on white background). Metacontrast masking gives rise to a rich phenomenology of subjective percepts that depend on stimulus factors (timing, contrast, eccentricity, shape, relative energy), but also vary strongly between observers (Albrecht et al., 2010; Albrecht & Mattler, 2010, 2012, 2016). In the first part of their study, the authors collected detailed verbal descriptions of what the observers experienced in the different experimental conditions. From these reports, they derived seven subjective direct measures to be used in the second experiment. In that experiment, participants were presented with all experimental conditions (2 primes x 2 targets x 6 SOAs) for six sessions (following an entire additional session as practice). On each trial, participants indicated by a yes/no decision whether one particular percept had occurred (a subjective task). There was also an objective direct task in which participants tried to discriminate whether the masked prime was a square or diamond. There was no indirect task. The results offer a singularly rich picture of the subjective experience of 24 well-trained observers, measured with high precision (Fig. 2).

Fig. 2
figure 2

Hypothetical data inspired by Koster et al.'s (2020) study (there were additional awareness measures in their study). While a priming effect in response times (indirect measure) monotonically increases with prime-target stimulus-onset asynchrony (SOA), three subjective direct measures of awareness for the prime show a variety of patterns. The measure "target before mask" increases, while the measure "expansion" decreases. The behavior of the third measure, "rotation," depends on prime-target consistency: it increases in inconsistent trials only

First of all, the objective prime discrimination measure (in d' units) showed that for most observers, performance was either a declining or U-shaped function of SOA (the phenomenon of "type-B masking" that metacontrast is famous for; Breitmeyer & Öğmen, 2006; Kahneman, 1968). Only two observers showed an increase in performance with SOA, and two more observers performed at chance level throughout. Averaged across all observers, however, performance was declining with SOA and leveled off at d' ≈ 0.5 (which is low performance but clearly above chance).

Each of the subjective measures showed a similarly distinctive pattern, but often very different from objective performance. The likelihoods of (1) perceiving a prime before the mask, of (2) perceiving the prime as dark, and of (3) perceiving no prime at all were distinctly U-shaped and markedly increased at longer SOAs, while the likelihood of (4) perceiving a bright prime was constant with SOA. Interestingly, the likelihood of (5) perceiving rotation between the prime and mask increased with SOA, but only when prime and mask were inconsistent in shape (none of the other measures showed this dependence on prime-mask consistency). Only the likelihood of (6) perceiving the prime as filling out the mask and of (7) perceiving an expansion from prime to mask had the same declining time-course as the objective measure.

The complex but regular data pattern reveals a number of dissociations between visibility measures. Some direct measures increase over the SOA range while others are increasing or leveling off, and one subjective percept depends strongly on prime-mask congruency while the others do not. There are several instances where over a stretch of time one of the measures increases while another one decreases, so that the pair of them forms a double dissociation (Schmidt & Vorberg, 2006). Figure 2 shows a set of visibility measures in idealized form to facilitate our later arguments.

Criterion content as a set of cues

With this example of dissociated direct measures in mind, we are now ready to expand on Kahneman's (1968) concept of criterion content. We do that by redefining criterion content as a set of cues that an observer uses to perform the task (Fig. 3).

Fig. 3
figure 3

Outline of Cue Set Theory (CST). Relations between one objective and one subjective measure in the dissociation paradigm. CST supposes that the indirect task implies a critical feature, φc, which is the stimulus feature that generates the effect in the indirect measure (e.g., priming of responses by shape or by color, respectively). Dissociations between direct and indirect tasks can only be demonstrated on the basis of the critical feature, otherwise there is a mismatch between tasks. The critical feature thus provides an anchor for the critical cue, qc, which is defined as perceptual awareness of the critical feature. Objective and subjective direct tasks are performed on the basis of their respective criterion contents, CO and CS, which are observer- and task-specific. Criterion contents are the sets of cues (shown as small squares) factually used to perform the respective task. Cues can be of diverse origin and need not be perceptual; they must be integrated to form the direct measures. Some of them may remain unused by either task (forming set U). Methodological debates revolve around the question whether or not direct tasks measure awareness of the critical feature (indicated by question marks). Note that this depiction only shows a special case where the criterion contents are partially overlapping and the critical cue is contained in both of them

While the critical feature (φc) is defined on the basis of stimulus differences (e.g., in physical features, category membership, or whichever distinction is driving the indirect effect), criterion content is based on sources of information (cues) within the cognitive system of an observer that are used to perform the direct task. Cues can be based on diverse sources of information. The critical cue (qc) is the one that directly corresponds to visual awareness of the critical feature (e.g., awareness of prime shape in shape TID, awareness of prime color in color TID). But the critical cue is not necessarily what is factually used by the participant. What cues beside the critical cue can be used by a participant trying to perform the direct task?

First of all, (1) auxiliary cues are perceptual cues other than the critical cue that can be helpful in discriminating the prime, like a perceived flicker, a brightening or darkening, an expansion or rotation (Albrecht et al., 2010). In addition, (2) sensorimotor cues might arise from the response conflict induced by the prime, which is known to activate an initial motor response that can easily lead to a response error when the prime is inconsistent (Panis & Schmidt, 2016; Schmidt, 2000; Schmidt et al., 2006; Schmidt & Schmidt, 2009; Schmidt et al., 2010; Vorberg et al., 2003). If target ID and prime ID are performed on the same trial, observers may be able to monitor the speed and accuracy of their response, the perceived effort, or the initial tendency to respond. Such cues are especially useful when direct and indirect measures are employed on the same trial: for instance, an error in the indirect task may lead the observer to infer that the prime was inconsistent to the target, enabling an informed guess of its identity (Biafora & Schmidt, 2020). Similarly, (3) decisional cues result from the perceptual decision process (measured, for instance, by confidence ratings or by type-II d' – a measure of how well observers are able to classify their perceptual decisions as correct or incorrect; Maniscalco & Lau, 2012; Zehetleitner & Rausch, 2013). (4) Fringe cues refer to hunches, gut feelings, fringe sensations, or other exotic sources of information. (5) Strategic cues are not experiential in nature but still might aid or harm performance in the direct task. They are not related to the stimulus or response but stem from prior knowledge (or assumptions) about the task. For instance, participants may use prior information about the relative frequency of different primes or congruency conditions (educated guessing), or they might try to count different prime types and then pick the one that is more frequent (or less frequent, if the participant assumes that primes are drawn without replacement). Finally, some authors worry that the direct measure could be contaminated by (7) automatic cues: The prime could activate its associated response not only in the indirect but also in the direct task (Kiesel et al., 2006), which would lead to an overestimation of its visibility.

Now we are ready to expand Kahneman's notion of criterion content. In what follows, note that all definitions are formulated at the level of an individual observer.

Definition (i)

Criterion content. Let q be a cue, and let qTi be cue i that a given participant factually uses to perform task T. Then, the discrete and finite set CT forms the criterion content for this participant and task, CT = {q | q = qTi}. Cues that are not part of any criterion content are in the set U of unused cues, U = {q|qCT for all T}. Objective direct measures, DO, and subjective direct measures, DS, are functions defined on their respective criterion contents, DOfO(CO), DSfS(CS).

It is reasonable to assume that the number of cues an observer actually uses in a given task is fairly small, so that criterion contents will rarely consist of more than two or three cues.

Note that our definitions allow objective and subjective measures to be based on identical criterion contents. It is a frequent misunderstanding that "subjective" content is best captured by a "subjective" measure, as if objective measures were somehow devoid of subjective content. The difference between objective and subjective measures lies solely in whether the observer’s responses can be compared with the external stimulus, not in the nature of the internal evidence on which they are based.

Direct measures integrate the cues in their criterion contents

We just defined direct measures as functions of the criterion content; we now outline possible functions. We assume that in order to perform a psychophysical task, observers have to integrate the cues in their task-specific criterion content (Anderson, 1992; Marks & Algom, 1998). Generally, integration can be accomplished in many ways given that cues can differ in their scaling properties (e.g., they may form indicator, ordinal, interval, or ratio scales; they may also be vector-valued, like color coordinates). For the sake of illustration, let us assume that all the cues in the criterion content are real-valued random variables coded such that larger values denote more evidence for the information addressed by the cue, and that an observer is trying to maximize her performance in an objective direct task (for typical experiments, that means maximizing response accuracy in identifying the critical feature). How this is done in an optimal way that maximizes the reliability of the integrated measure is a classical problem in mathematical statistics (Cochran, 1937). Under the assumption that the cues are uncorrelated, it is optimal to weigh the cues according to their reliabilities, i.e., their ability to predict the critical feature (e.g., Drewing & Ernst, 2006; Landy et al., 1995; Oruç et al., 2003).

But optimizing a measure's reliability is only one way of integrating the cues in a criterion content; there could be radically different criteria for integration. In the psychological literature on heuristic decision making, many integration schemes are discussed (Gigerenzer & Gassmeier, 2011). For example, an observer may have many cues available to her, but choose to restrict her entire criterion content to only one cue (tantamount to setting its weight to 1 and all others to 0). If this is the critical cue, we call this measure exclusive for the critical cue. This property may not guarantee that the criterion content allows for optimal performance (because additional cues may have led to further improvement), but it means that the critical cue is the sole basis of performance and that the measure is free of contamination by other sources of information.

While objective direct measures may be optimized with respect to objective performance, no such external criterion exists for subjective measures. Subjective measures could aim to optimize internal criteria instead, like confidence in a decision (Hellmann, Zehetleitner, & Rausch, 2023; Locke et al., 2022; Zehetleitner & Rausch, 2013). As a result, subjective measures can vary a lot in the range of awareness levels they can cover, as well as how they respond to a state of unawareness (Wierzchoń et al., 2012). An example of a subjective integration criterion would be an observer who has no perceptual cues available to her but a subjective feeling whether or not her decision was correct. Occasionally, observers in masked prime discrimination tasks report monitoring their initial motor impulses (they "follow where their finger wants to go"; see Kiesel et al., 2006). These observers seem to optimize the perceived difference between the initial motor impulse and the subsequent discrimination response – we privately call this the "Zen Mode" of Prime ID. Still other observers might be content with a measure that keeps the task comfortable and minimizes the perceived effort invested (e.g., the occasional negligent participant who always presses the same key). Importantly, note that nothing would hinder participants to use subjective integration criteria like this in objective tasks as well.

Ultimately, performance in a direct task T is determined both by the specific cues in the task's criterion content, CT, and by the manner of their integration, fT(CT). It is therefore difficult to say whether there is any optimal set of cues or any optimal integration function for a given observer, because it is possible that a given combination of criterion content and integration function might be outperformed by some other combination. We also have to deal with the possibility that an observer may integrate a criterion content in such a twisted way that the measure changes sign with respect to the information provided by a cue, such that the measure decreases when the evidence in question actually increases. The following definitions will help to exclude such cases from further consideration.

Definition (ii)

Monotonic and exhaustive integrators. Assume a measure M with criterion content CM = {q1, q2…} and all cues qi coded such that larger values indicate stronger evidence. M is a monotonic integrator of CM if for any cue qi and all other cues remaining equal, qi'qi implies M(..., qi', ...) ≥ M(..., qi, ...). M is an exhaustive integrator of CM if strict inequalities hold, such that for any cue qi and all other cues remaining equal, qi' > qi implies M(..., qi', ...) > M(..., qi, ...).

Whether or not an integrator is exhaustive is an all-or-none property; it makes no sense to state that one measure is more exhaustive than another. We will see shortly that the strict inequalities required for exhaustive integrators lead to the classical "exhaustiveness problem" of the dissociation paradigm (Reingold & Merikle, 1988). Note that the properties of monotonic and exhaustive integration only need to hold in the long run at the level of expected values. Also note that a measure that does not respond to changes in its criterion content at all is already sufficient to satisfy the requirement of monotonic integration (e.g., a participant who always gives the same response). Monotonic integration is only violated when a measure systematically (i.e., in the long run) responds in reverse to the cues in its criterion content. We shall assume throughout the paper that all direct measures considered are monotonic integrators.

A new look at exhaustiveness and exclusiveness

Reingold and Merikle (1988) argue that a direct measure of visual awareness in the dissociation paradigm should have two properties. First, it should be exhaustive for visual awareness, which means that all theoretically relevant aspects of visual awareness are covered by the measure. This is a logical requirement for interpreting simple dissociations: Zero sensitivity or chance performance in the direct measure can only imply the absence of awareness if it is certain that no aspects of awareness escape measurement. Second, the direct measure should be exclusive for visual awareness. Even though this is not a logical requirement for the dissociation paradigm (see the mathematical appendix in Schmidt & Vorberg, 2006, where the exclusiveness assumption is never needed), it is a desirable property because a non-exclusive direct measure could be contaminated by unconscious information (e.g., automatic cues; Kiesel et al., 2006). CST allows us to formulate these properties more specifically.

Actually, exhaustiveness turns out to have two aspects: the validity of a direct measure's criterion content (i.e., which cues are integrated), and the reliability of the direct measure that results from the integration (cf. the distinction between information criteria and sensitivity criteria for direct measures in Shanks & St. John, 1994). Indeed, a direct measure can only be called exhaustive if it is both exhaustively valid and exhaustively reliable.

For a direct measure to be exhaustively valid, all theoretically relevant cues need to be part of its criterion content. Exhaustive validity usually requires the critical cue, but may also call for some additional cues (e.g., decision confidence). Whenever the criterion content fails to include a cue that could help predict the critical feature, it is possible that this neglected cue is just the one that generates nonzero performance in the indirect measure. It would then be possible that the direct measure shows null sensitivity only because it misses this crucial cue. -- Formally, exhaustive validity requires that ΘTCT, where ΘT is the set of all cues that are theoretically relevant in task T and where CT is the criterion content. Note that what constitutes a “theoretically relevant” cue is a matter of the substantial research question: In contexts like masked priming, it may be desirable that the direct measure contains only perceptual cues, while in contexts like intuitive decision making decisional and fringe cues are of theoretical interest.

The reliability aspect of exhaustiveness was introduced by Reingold & Merikle (1988) and further investigated by Schmidt and Vorberg (2006) in their study of different types of dissociations between direct and indirect measures. They showed that the assumption of exhaustive reliability postulates a psychophysical measure that is a strictly monotonic function of conscious information (here, of all theoretically relevant cues in the criterion content). Strict monotonicity means that the measure is able to detect any increase in conscious information, however small, like an infinitely sensitive barometer that never "hangs". In other words, an exhaustively reliable measure must be an exhaustive integrator of its criterion content. In contrast, a measure that is merely a monotonic integrator but not an exhaustive one may fail to detect an actual increase in awareness: there may be some cue that increases in value without the direct measure picking it up. The problem with exhaustive reliability is that it requires an infinitely sensitive measure. Even though this property only needs to hold in the long run (at the level of expected values), it is clearly an untenable assumption in a psychophysical context. Fortunately, Schmidt and Vorberg (2006) show that only simple dissociations require exhaustive reliability while double dissociations do not. For instance, if the indirect measure increases under experimental manipulation while the direct measure decreases, the direct measure is no longer required to show zero sensitivity, and integrators only need to be monotonic, not exhaustive.

Finally, the concept of exclusiveness (Reingold & Merikle, 1988) can also be redefined under CST. We call a direct measure exclusive for theoretically relevant cues when its criterion content only includes theoretically relevant cues and excludes all others. This requirement is met when CT ΘT. A special case applies when a measure is exclusive for the critical cue, i.e., when CT = {qc} for some task T. Such a measure not only needs to include the critical cue, but also to exclude all others. We can summarize the foregoing in the following definitions, keeping in mind that we should distinguish between theoretically relevant cues and additional, irrelevant ones:

Definition (iii)

Exclusiveness. Let ΘT = {r1, r2…} be the set of all cues that are theoretically relevant for a specific task T, and let qc be the critical cue, qc ΘT. A criterion content that only consists of the critical cue, CT = {qc}, is exclusive for the critical cue. A criterion content that only consists of theoretically relevant cues, CT ΘT, is exclusive for theoretically relevant cues.

Definition (iv)

Exhaustiveness. Assume a direct measure DT with a criterion content CT and with all its cues coded such that larger values indicate stronger evidence. Let RT denote the set of cues that are both theoretically relevant and part of the criterion content, RT ΘT and RTCT, and let ST be the set of all remaining cues in the criterion content, so that CT consists of a set of theoretically relevant and of a set of theoretically irrelevant cues, CT = {RT, ST} = {…, ri, …; …, si…}.

  1. a)

    DT is exhaustively valid if its criterion content contains all theoretically relevant cues, ΘTCT.

  2. b)

    DT is exhaustively reliable if it is an exhaustive integrator of the theoretically relevant cues in its criterion content, so that, all other cues remaining equal, ri' > ri implies DT(..., ri', ...; …, si, ...) > DT(..., ri, ...; …, si, ...) for all ri, ri', siCT.

  3. c)

    DT is exhaustive if it is both exhaustively valid and exhaustively reliable.

Note that the critical cue, if it exists, is always part of the set of theoretically relevant cues. Therefore, exhaustive validity of a direct measure implies that the critical cue is part of its criterion content, and exhaustive reliability means that the direct measure is (ceteris paribus) a strictly monotonic function of the critical cue.

Again, the strict inequality in the definition means that an exhaustively reliable measure will respond to any change, however small, in the theoretically relevant cues in its criterion content, which guarantees that a nonresponse of the measure implies a nonresponse in all theoretically relevant cues. Also observe the interesting logical relationship between exclusiveness and exhaustive validity: If a criterion content is exclusive for theoretically relevant cues, CT ΘT; if it is exhaustively valid, ΘTCT; and if it has both properties, CT = ΘT. Note that our definition allows for additional, theoretically irrelevant cues si as long as they do not spoil the monotonic integration of the relevant cues ri.

Choosing direct measures that capture the critical cue

We can now apply this classification to concrete measures. For instance, are there measures that are exclusive for the critical cue, i.e., respond only to qc but to nothing else? To be a plausible candidate for this remarkable property, such a measure will have to ask directly for the critical feature, because it is the one that drives the indirect effect. It is usually easy to formulate such candidate measures, both in their objective and their subjective variants. For example, in the color task of our example experiment it is the difference in prime color (red or green) that defines prime-target consistency and thus defines the priming effect. This implies that the objective task most likely to be exclusive for the critical cue would be discrimination of the prime as red or green, of course using the same stimuli as the indirect task. A corresponding subjective measure would ask whether the observer perceived the prime as red or green, but in a way that does not force the participant to pick any one color (because this would turn the measure into an objective one). Some possibilities are the following: "Could you see whether the prime was red or green? Answer yes or no"; "Please rate how clearly you saw that the prime had one color rather than the other." Note that those questions ask specifically about the distinction between the two colors, but that the answers cannot be classified as correct or incorrect. Admittedly, these examples of subjective measures seem contrived, which is why we see the value of subjective measures mostly when it comes to facets of awareness not directly based on the critical feature.

A very interesting sort of direct measure is a bipolar rating scale, with one pole marked as "I clearly saw that the prime was red" and the other pole as "I clearly saw that the prime was green," with the various degrees of clarity in between. This is a hybrid measure that unites an objective and a subjective measurement: Choosing the “red” or “green” half of the scale is an objective discrimination task using criterion content CO, and choosing the magnitude of the rating is a subjective task using criterion content CS. If the wording is specific enough, this is an attractive candidate for a task that is both an exclusive objective task and an exclusive subjective task, in which case CO = CS = {qc}. The popular bipolar scale that uses confidence instead of clarity ratings is of course another example of a hybrid measure, but one where CO and CS are hoped to give independent rather than concordant pieces of information.

The problem is that in order to know that a task is exclusive for the critical cue, we have to exclude the possibility that any cue except the critical one influences the behavior of the observer. Given the large number and idiosyncratic nature of possible task strategies, this seems difficult to do. However, observers can be trained to adjust their criterion content (or its manner of integration) to the experimenters' instructions (Koster et al., 2020). After all, it is reasonable to assume that the number of cues an observer actually uses in a given task is fairly small and that observers are able to shift their attention between alternative cues.

Overlap between criterion contents

Our set-theoretic representation now allows us to compare objective and subjective measures with criterion contents CO, CS, which are both intended to measure awareness in a dissociation paradigm. We have seen that the indirect task defines a critical feature, and therefore the classical paradigm requires at least one direct measure that includes the critical cue to avoid mismatch between direct and indirect tasks. The deciding questions becomes: where is the critical cue in relation to the criterion contents? And if an objective and a subjective task are employed in tandem, what are their respective roles in a possible dissociation? We have already treated the case that the criterion contents CO, CS of an objective and a subjective measure are identical. Now we discuss the remaining scenarios (Fig. 4).

Fig. 4
figure 4

The concepts of one criterion content covering one another (a, c), one criterion content outmatching another (b, d), two criterion contents not covering one another (e), and two criterion contents disjoint (f)

  • Figure 4a, b: If the criterion content for the subjective measure completely includes the criterion content for the objective measure, COCS, CS is said to cover CO. In that case, all the cues that could be used to optimize the objective measure are also available for the subjective measure (Fig. 4a). As a special case, both measures may use the same cues, CO = CS. Another special case occurs if the criterion content of the objective measure is a proper subset of the subjective one, COCS, in which case we say that CS outmatches CO. When this occurs, the subjective measure can use all the cues available for the objective measure, but not vice versa. In other words, there are cues being used that are unique to the subjective measure (Fig. 4b). Subjective measures are interesting precisely because there are uniquely subjective cues, such as stimulus clarity or decision confidence, that have no counterpart in objective measures.

  • Figure 4c, d: These concepts apply symmetrically for objective and subjective measures. In Fig. 4c, CO covers CS, and all the cues that could optimize the subjective measure are also available for the objective measure. In Fig. 4d, CO outmatches CS: the objective measure can use all the cues that could optimize the subjective measure, but not vice versa. Examples of objective measures that rely on cues unique to them are same-different tasks where the difference between stimulus conditions is difficult to verbalize, the study of differential behavior in animals who cannot provide subjective measures, or the observation of differential sucking rates in infants.

  • Figure 4e: Each criterion content has elements that are not included in the other one, CO \ CS ≠ ø AND CS \ CO ≠ Ø. In this case, CO and CS do not cover each other. Each of them uses cues that are unique for the measure, and neither can outmatch the other.

  • Figure 4f: If the two criterion contents have no cue in common, CSCO = Ø, they are disjoint and the tasks are performed on the basis of entirely different sources of information. In that case, the subjective measure cannot use any of the cues that could optimize performance in the objective measure, and vice versa. Of course, disjoint criterion contents imply that CS does not cover CO and vice versa.

Where is the critical cue?

It is clear that the critical cue can be contained in CO but not in CS, in CS but not in CO, in both, or in neither, and there is the additional possibility that it does not even exist (Fig. 5). From the representations in Figs. 4 and 5, many special cases can be constructed. For example, if CO and CS are disjoint, at most one of them can contain the critical cue, and the other one is guaranteed not to contain it. If CO contains the critical cue and CS covers CO, then CS contains it too; and so on. In the following, we are looking at a number of scenarios where an objective and a subjective task are used together.

  • Case A: The critical cue is part of CO but not of CS (qcCO \ CS). In this case, only the objective but not the subjective measure can form a dissociation with respect to the indirect task. The subjective measure either fails to ask about the critical cue, or observers execute it in a way that circumvents the use of the critical cue. For example, a subjective measure in our example experiment might ask the observer, "Please rate how clearly you perceived the prime." This instruction leaves the criterion content to the observer and is not focused specifically on the critical feature. If prime color is the critical feature, participants might perform the rating on the basis of their ability to detect the prime's presence or absence (e.g., its perceived brightness, darkness, or flicker) without looking out for its color. Later, we will see that applications of the Perceptual Awareness Scale (PAS; Ramsøy & Overgaard, 2004) have the problem that the unspecific questioning does not make sure that the subjective measure contains the critical cue.

Fig. 5
figure 5

The critical cue can be part of CS, CO, both, or neither. It may not even exist

If the subjective measure does not contain the critical cue, does it mean it is useless? Not at all, because it can provide valuable information beyond that needed to establish the dissociation. The paradigm case is the concomitant use of an objective prime discrimination judgment with a subjective confidence rating, as in the construction of an ROC curve.

Remember that the effectiveness of a measure jointly depends on the criterion content and on the way the cues in the criterion content are integrated. Even if CO does contain the critical cue, there is no guarantee that the observer will use it effectively or exclusively.

  • Case B: The critical cue is part of CS but not of CO (qcCS \ CO). It is difficult to find an example where such a combination of tasks would be employed deliberately; it rather arises in cases where the objective direct measure is misspecified. One such example would be the inappropriate use of a detection task where a discrimination task would be in order. In signal detection theory, discrimination and detection can be modeled within the same decision space, but the criteria for the two tasks can be orthogonal (Macmillan, 1986): while discrimination requires a criterion that separates signal A from signal B, detection requires a criterion that separates both A and B from noise (also see Snodgrass et al., 2004). From this model, it would be both possible to detect a stimulus without being able to discriminate it (as in many cases of visual masking) and to discriminate it without being able to detect it (as in blindsight). Note, however, that Macmillan’s (1986) model assumes that the subjective evidence for detection and discrimination is based on the same metric and can be described within a single two-dimensional space, an assumption that is called into question if both tasks are based on different criterion contents.Footnote 2

  • Case C: The critical cue is part of both CO and CS (qcCOCS). This is the situation where objective and subjective direct tasks can give convergent information in the dissociation paradigm. For instance, the objective task may ask observers to discriminate the shape of the prime, and the subjective measure may ask them to rate the clarity of their shape impression. (Compare this with the foregoing example where the subjective measure was a confidence judgment.) Again, note that even if the critical cue is contained in the criterion content of a given task, there is no guarantee that an observer makes optimal use of it.

  • Case D: The critical cue is not used, qcU, or does not exist. If the critical cue is not used in either measure, it is possible that it is principally inaccessible to the observer, or that both measures are misspecified as discussed above, or that the observer is not following instructions. A classical case where the critical cue, in all likelihood, does not exist is in research on “extrasensory perception.” For instance, in an experiment where participants try to receive a telepathic image, there probably is no critical cue because there is no telepathy and thus no critical feature. The task may still be performed with some success, but only on the basis of strategic cues (e.g., educated guessing of motifs that are likely to be "transmitted").

One of the issues that can be reevaluated in light of CST is the distinction between objective and subjective thresholds. Cheesman and Merikle (1984, 1986) were the first to distinguish between objective thresholds of awareness (based on performance in objective tasks) and subjective thresholds, defined as "the prime-mask SOA at which an observer consistently claimed to detect the primes at a chance level of accuracy" (p. 352). It is generally assumed that subjective thresholds are lower than objective ones, in the sense that stronger masking is necessary to keep stimuli beneath the objective threshold (Snodgrass et al., 2004). Of course, this assumption requires that the measures can be ordered in terms of their sensitivity within some kind of decision space. But from the viewpoint of CST, establishing the relative sensitivity of two measures is not trivial because it would depend jointly on the amount of overlap between the respective criterion contents, CO and CS, as well as on their modes of integration. Remember that depending on how the tasks are specified, the two criterion contents may be quite dissimilar, and any attempt to order them may not be meaningful (Zehetleitner & Rausch, 2013). The only thing we can safely say at this point is that both CO and CS would need to include the critical cue, or else a dissociation with a third, indirect measure could not be established. Beyond that, however, a theory of relative sensitivity of direct measures is a task for the future.

Ironically, the successful establishment of a simple dissociation may lead to a situation where the critical cue becomes inaccessible to the observer, so that it becomes questionable whether it is still part of the criterion content of the direct task. Under visual masking, for instance, it can be very difficult or even impossible to discriminate a prime's shape or color. If that is the case in all conditions of the experiment (e.g., Schmidt & Schmidt, 2010), the observer has no choice but to resort to other cues to perform the task (e.g., auxiliary perceptual cues), working around the critical cue. It is therefore wise to mix low-visibility conditions with other conditions where the critical cue is actually helpful to ensure that the participant is still on task. Another alternative is to aim for a double dissociation where complete masking is not required (Biafora & Schmidt, 2020).

Is the Perceptual Awareness Scale valid?

The PAS (Ramsøy & Overgaard, 2004) is a widely used subjective measure of visual awareness and presented in its original form in Table 1. Is the scale a valid direct measure of awareness? It should be clear by now that in the dissociation paradigm, the validity of any direct measure can only be assessed in the context of the indirect task. If the indirect effect is driven by a critical feature, the direct measure must ask about that feature's representation in visual awareness, the critical cue. Otherwise, arguments for dissociation fall short because direct and indirect tasks have different objects of measurement.

Table 1 The Perceptual Awareness Scale (Ramsøy & Overgaard, 2004)

Let's see if we can apply the PAS to our model experiment (Fig. 1). Do the scale categories and their respective descriptions make sense with respect to the critical features? The first thing we notice is that the rating categories are the same irrespective of the task. They do not adapt to whether we ask for the color or shape of the prime; instead, they refer rather vaguely to "the stimulus" or just "something." Moreover, the four rating categories refer to the visibility or “clarity of experience” of the prime as a whole (like in a detection task), not to the visibility of a particular critical feature. Two of the categories (2, 3) acknowledge that there might be multiple perceptual cues. Three of the categories (1, 3, 4) refer additionally to decisional cues and to the confidence in the perceptual decision (in Table 1, we marked wording that is ambiguous with respect to the criterion content or that refers to decisional cues). In sum, it seems that observers are supposed to rate two things at once: the subjective detectability (but not the discriminability) of the prime, and their own confidence in making this judgment.

Originally, Ramsøy and Overgaard (2004) used their scale in a more differentiated way. They introduced it in an experiment where the critical stimulus was one of three shapes appearing in one of three colors and at one of three locations. The scale was then applied separately to each of those stimulus features: the feature of interest was specified first and then the scale was applied specifically to it. Does this work for our model experiment? If color is the critical feature, we want to know whether observers have awareness for the distinction between red and green. If we replace "stimulus" with "color" in the scale descriptions, we again notice how fuzzy they are. "No impression of the color" and "Non-ambiguous experience of the color" are still reasonably clear. But what about "A feeling that something colored has been shown. Not characterized by any content, and this cannot be specified any further"? What would the "content" of the colored thing mean here – the specific hues of red or green? And what is the ominous "this" that cannot be specified any further – the "content" or the "characterization"? In fact, the two middle categories are formulated in a way that it is difficult to reconcile them with feature discrimination; they are clearly designed with detection in mind. Because the dissociation paradigm almost invariably employs an indirect task based on discrimination and not detection, and because those two types of tasks may be based on orthogonal decision criteria (Macmillan, 1986), the PAS is generally not a suitable choice.

Let's switch perspective and ask what a dissociation experiment would look like for which the PAS would be a good choice. Because the direct measure focuses on detectability (plus confidence), the indirect task would have to be a detection task as well. The indirect effect would thus depend on the presence or absence of the critical feature, not its identity. The labeling of one rating category as "brief glimpse" further excludes tasks where the critical stimulus is presented for prolonged times, as in binocular rivalry, continuous flash suppression, or some inattention paradigms. It confines the PAS to experiments with briefly flashed stimuli, like masked priming or the attentional blink paradigm.Footnote 3

Claims of measurement properties and what they require

The literature on perception without awareness is full of claims concerning the measurement properties of various direct measures as measures of visual awareness, for instance their validity, exhaustiveness, or exclusiveness. If we translate the expression "Measure M is a valid measure of visual awareness " into "Measure M's criterion content CM contains the critical cue," we can specify the assumptions the respective claim has to meet (Table 2).

Table 2 Frequent claims about properties of a direct measure M, the assumptions about the criterion content CM implied by those claims, and possible counterarguments against those claims

The claim that a direct measure does at least have some validity is difficult to dismiss. Given our definition of validity, the claim only requires that qc be an element of the criterion content of the task. Even if a task is grossly misspecified, it is possible that participants spontaneously use the critical cue anyway. The PAS scale, for instance, was designed primarily with quickly presented stimuli in mind. If it is instead applied to temporally extended stimuli, like in rivalry paradigms, observers that have awareness of the critical feature may spontaneously rate their experience on their own internal four-point scale and map it to the PAS categories, even if the wording does not fit (as anticipated in Kahneman, 1968).

The claim that a direct measure is an exclusive measure of awareness is only true if the criterion content consists solely of the critical cue and no other sources of information are used (not even those that are correlated with qc). This is a strong claim that can be countered empirically by showing that task performance is influenced by factors other than the critical stimulus feature, for example, by inducing different response strategies to show that strategic cues are being used on top of the critical cue.

The claim that only one specific task can measure awareness is even stronger because it is only true if every other measure's criterion content is devoid of the critical cue. This is an implausible assumption because other direct measures will be correlated with the task at hand under parametric variations of the critical feature (e.g., increasing color contrast would not only increase color discrimination performance but also confidence ratings or clarity ratings). And of course, there are usually alternative measures that also address the critical cue directly.

The claim that only one class of measures (e.g., only subjective ones or only objective ones) can measure awareness is frequently encountered in the literature, often with the pretension that one or the other class be a "gold standard" in measuring awareness. But such a claim is only true if for all measures not contained in that class, the criterion content is devoid of the critical cue. This is implausible for two reasons. First, there are many situations where the objective and subjective measures are highly correlated (e.g., Peremen & Lamy, 2014), and it is difficult to dismiss the possibility that they both use the critical cue. Second, it is often possible to find pairs of objective and subjective tasks that directly ask for the critical cue or feature. For instance, the objective task "Determine whether the prime was a square or a diamond" can easily be translated into a subjective task, "Could you see whether the prime was a square or a diamond? Answer yes or no." If objective and subjective tasks both explicitly ask about the critical cue, there is a strong possibility that they both use it.

Sometimes a measure is proposed to be exhaustive. We saw above that exhaustiveness has two aspects: one concerns the reliability of the measurement (whether the direct measure is an exhaustive integrator, i.e., a strictly monotonic function of the cues in its criterion content; Schmidt & Vorberg, 2006) and the other one concerns its validity (whether the criterion content comprises all theoretically relevant cues). Exhaustive reliability is usually out of the question: even if the requirement of strict monotonicity only needs to hold on the level of expected values, the requirement is difficult to meet for any empirical psychophysical measure that has a reliability clearly < 1 and an appreciable standard error. Exhaustive validity, on the other hand, requires the slightly less extreme assumption that the criterion content contains all theoretically relevant cues and that none of them remains unused. A claim that a measure is exhaustively valid can be countered by demonstrating that there is some theoretically relevant cue that the measure does not respond to, even though a rival measure could. Even though this result could also occur when the measure in question does use all relevant cues and is just not optimally integrated, it calls the claim into doubt.

As an example, in Koster et al.'s (2020) study the data indicate that the objective measure (prime discrimination) is not exhaustively valid. If it were, its criterion content would include the perceived rotation between prime and mask because this rotation predicts the congruency of prime and mask: for instance, if you see a square target preceded by a rotating motion, you can make an educated guess that the prime has probably been a diamond. Objective discrimination performance should then increase, not decrease, with SOA, because perceived rotation increases as well. This implies that CO fails to cover the CS of the rotation measure: There is at least one cue in that subjective measure's criterion content that is not utilized in the objective measure.

Open-feature indirect tasks and invalid indirect measures

There are indirect tasks that do not generate a well-defined critical feature in the first place. The most important examples include indirect effects that do not depend on the identity of a prime, but only on its presence or absence (implicit detection tasks). Because detection can occur on the basis of any stimulus feature (it is based on the disjunction of all features), it is not clear from the task whether any one of them is critical, or which one is (see Wilken & Ma, 2004, for models of change detection in a disjunction of features). Other indirect effects may be driven by same-different distinctions or oddity detection (e.g., mismatch negativity, oddball tasks). For example, van Opstal et al. (2010) show that when observers make same-different judgments on a pair of target stimuli, they are primed by same-different relations in a masked pair of primes, even if the primes and targets come from separate stimulus domains. In such a task, it is difficult to pinpoint the critical feature.

Open-feature tasks are tasks that deliberately leave the choice of criterion content to the observer. This can be a great advantage: for instance, animals or little children can indicate whether two stimuli are the same or different even though they are unable to verbalize the difference (for instance, they may look preferentially at a new or mismatching stimulus and thus indicate that they have processed the difference). But for the dissociation paradigm, open-feature tasks provide a great challenge. Instead of fulfilling their "anchoring" function of providing a single critical feature, they are based on a set of possible features that all might drive the indirect effect. In consequence, these indirect tasks are based on some criterion content of their own, and the set of cues in that criterion content is usually not precisely known. If that is the case, it is difficult to find direct tasks that can provide a valid comparison, and any apparent "dissociation" is easily one between apples and oranges. Open features can greatly complicate the formal analysis of the dissociation paradigm: They essentially turn Fig. 3 into a display of three overlapping sets and increase the number of special cases to be considered.

How well do you know your direct measure? A checklist

From the foregoing, it should be obvious that the choice and construction of a suitable direct measure requires a lot of consideration, both on the theoretical and on the practical side. Table 3 provides a checklist for properties of direct measures that integrates many of the issues discussed in this paper, plus some practical issues that frequently arise in the measuring and testing process.

Table 3 How well do you know your direct measure?

Double dissociations among direct measures: moving beyond the classical dissociation paradigm

The dissociation paradigm in its classical form is based on the comparison of one direct and one indirect measure. If those two measures form a double dissociation (one increasing under experimental manipulation, the other decreasing), then we can dismiss the possibility that both measures are monotonic functions of the same single source of information (Schmidt & Vorberg, 2006; Biafora & Schmidt, 2020). In particular, they cannot both be based on a single source of conscious information, so that there must be a second information source dissociable from it. Of course, the same logic applies when we compare several direct measures. Many measures have been proposed that are all supposed to measure awareness of the prime. But if there are double dissociations among those measures, it follows that they cannot all measure the same unitary source of information (i.e., the same cue), or indeed the same criterion content (i.e., the same set of cues).

In the following, we explore the consequences of double dissociations among a set of direct measures. We assume that all cues and measures are scaled with the same polarity, such that larger values indicate greater evidence for the feature in question. Following Schmidt and Vorberg (2006), we define dissociations by comparing measures under pairs of experimental conditions. To simplify matters, we use the symbols, <<, >>, and == to indicate that measures obtained under two experimental conditions are unequivocally different or similar, for instance because they passed a statistical or numerical criterion (leaving aside the statistical issues).

Definition (v)

Dissociation and association. Let Ai and Bi denote two measures A, B with criterion contents CA, CB, observed under two experimental conditions i, i ∈ {1, 2}. Assume that A and B are scaled with the same polarity. Then A and B form a simple dissociation if A1 << A2 and B1 == B2 (or vice versa), a double dissociation if A1 >> A2 and B1 << B2 (or vice versa), and an association if either A1 << A2 and B1 << B2 or A1 >> A2 and B1 >> B2.

We next prove that two double-dissociated measures cannot measure the same unitary content. The proof closely follows the one in Schmidt and Vorberg (2006).

Proposition 1

No single source for double dissociations. Assume two measures A, B with criterion contents CA, CB, that are both monotonic integrators, are scaled with the same polarity, and are observed under experimental conditions i, i ∈ {1, 2}. Then a double dissociation between A and B rules out that both criterion contents consist of the same single cue, q*.

Proof

Suppose that A1 << A2 while B1 >> B2 (the proof for the reverse case is analogous). We show that the postulate CA = CB = {q*} leads to a contradiction. By this postulate and the assumption of monotonic integration, both A and B are monotonic functions of q* only. The observation that A1(q*) << A2(q*) thus implies that q*'s value has increased from condition 1 to condition 2. At the same time, the observation that B1(q*) >> B2(q*) implies that q*'s value has decreased in value from condition 1 to condition 2, which completes the contradiction. □

Consider again the eight measures included in Koster et al.'s (2020) study. We start by theorizing that they are all measures of the same unitary perceptual content, "awareness of the prime." In other words, we postulate that for each measure Mi, the criterion content CMi = {qa}, where qa is awareness of the prime. This postulate runs into trouble because there are double dissociations between some of the measures, which implies that they cannot all measure the same thing. This has far-reaching consequences for other potential measures, even those not included in Koster et al.'s set. As soon as two measures form a double dissociation, dissociative relations can spread across the entire network of potential measures. To see this, consider a direct measure Minc that clearly increases under experimental manipulations, and another direct measure Mdec that clearly decreases. The two form a double dissociation with respect to awareness of the prime, which we write as DD(Minc, Mdec). But any other measure Mi that clearly increases or decreases under the manipulation will be double-dissociated with either Minc or Mdec: either DD(Mi, Minc) or DD(Mi, Mdec). It becomes clear that double dissociations are contagious: as soon as there is even one in a set of possible measures of the same perceptual content, there will likely be others.

Consequences for theory-building: "Explaining dissociated gradients"

We are now ready to expand the dissociation paradigm by giving up the classical distinction between direct and indirect measures. The privileged roles of the indirect task, the critical feature, and the critical cue all fall away. What remains is a large set of possible measures with different criterion contents, different modes of integration, different measurement properties, and different measurement objectives. Some may be direct, others indirect; some objective, others subjective (Fig. 6). Whenever any two of them become double-dissociated, their criterion contents may overlap but cannot be constricted to a single perceptual content.

Fig. 6
figure 6

When the distinction between direct and indirect measures is given up, what remains is a set of measures whose criterion contents may overlap in various ways. Double dissociations between any two of them imply that their criterion contents cannot be restricted to the same single cue

The empirical observation of double dissociations among direct measures (Koster et al., 2020) has immediate consequences for any theory of consciousness. In particular, any theory that explains "consciousness" by a single monotonic process or mechanism is challenged by double dissociations between direct measures. The proof is practically identical to the previous one:

Proposition 2

No simple theory for double dissociations. Assume two direct measures Ci, Di that are scaled with the same polarity and are observed under experimental conditions i, i ∈ {1, 2}. Assume a theory T that explains variations in C and D as a monotonic function of a single process p, C = f(p) and D = g(p), such that f(p') ≤ f(p) and g(p') ≤ g(p) for any p'p. Then T is falsified by a double dissociation between C and D.

Proof (Schmidt & Biafora, 2022)

Suppose that C1 << C2 while D1 >> D2 (the proof for the reverse case is analogous). We show that the postulate C = f(p) and D = g(p) leads to a contradiction. By assumption of monotonicity of f and g, both C and D are monotonic functions of p only. The observation that C1(p) << C2(p) thus implies that p's value has increased from condition 1 to condition 2. At the same time, the observation that D1(p) >> D2(p) implies that p's value has decreased from condition 1 to condition 2, which completes the contradiction. Therefore, C and D cannot both be monotonic functions of p, and theory T is falsified.□

If several direct measures are available simultaneously, we call this set a gradient (Schmidt & Biafora, 2022). Here, we are especially interested in gradients that contain at least one double dissociation between direct measures, such that one of them increases across stimulus conditions while another of the same polarity decreases. Proposition 2 immediately generalizes to gradients of multiple direct measures by requiring any theory of visual awareness to explain the gradient of awareness measures. If a gradient contains at least one double dissociation, any theory explaining "consciousness" by means of a monotonic function of a single process is falsified:

Proposition 3

Explaining dissociated gradients. Suppose a set of direct measures Dj, j = {1, 2,…}, that are scaled with the same polarity and where at least two of the measures form a double dissociation. Let T be a theory that explains variations in Dj as a monotonic function fj of a single process p, Dj = fj(p), such that fj(p') ≤ fj(p) for all p'p. Then T is falsified.

Proof

Because the set of direct measures contains at least one double dissociation by definition, Proposition 2 applies. □

In other words, it is our belief that any theory of visual awareness must be specific enough to explain changes in different facets of awareness simultaneously, not just one at a time. This requirement to "explain dissociated gradients" is certainly a challenge to any theory of consciousness, because it needs to explain the variation in an entire multivariate set of awareness measures, especially if there are dissociations among them.

General discussion

Cue Set Theory is a theory of measurement. Just as an empirical theory can be judged by the data it can explain and predict, a theory of measurement can be evaluated by its ability to clarify concepts, clear up misunderstandings, explain and predict methodological difficulties, and ultimately improve measurement tools. We believe that CST contributes to the clarification of concepts by elucidating the idea of criterion content, its variation across awareness measures, and the crucial role of the critical cue as a pivot between direct and indirect measures. CST also clarifies the concept of exhaustiveness as having a validity as well as a reliability aspect – a distinction that was not clear to us before we attempted to formalize our theory. In this respect, the current paper can be viewed as a companion piece to the earlier one by Schmidt and Vorberg (2006): while the older paper dealt with the reliability aspect of awareness measures and regarded double dissociations as a useful tool for circumventing the exhaustiveness problem, CST now explores the validity issues that ensue when double dissociations start to appear among alternative measures of awareness. In a way, we are looking at the dark side of double dissociations here, which appear as brute facts of awareness measurement and raise fundamental problems for theories of consciousness.

We further hope that CST will help clear up some fundamental misunderstandings, for example, the one that in order to measure subjective awareness, the direct measures must likewise be subjective. Indeed, the sometimes fierce battles between proponents of subjective and objective measures should largely be settled by the realization that such measures may have overlapping but nonidentical criterion contents, that either may give invaluable information the other one could not provide, and that both types of measures can comfortably be united, e.g., in a simple bipolar rating scale. CST therefore has the potential not only of improving the quality of awareness measures, but also to specify the scopes and limitations of each such measure (e.g., in our critique of the PAS). Finally, even a theory of measurement may have some capacity for empirical predictions. Specifically, we expect that double dissociations between measures of awareness will be abundant, will continue to be discovered, and will require more specific and more refined theoretical explanations. In the course of that, we expect that research into cognition without awareness will increasingly be viewed as a study of task dissociations. Nevertheless, the dissociation paradigm will remain instrumental for providing the database for such a research project if applied in a straightforward, principled way.

CST expands Kahneman's (1968) notion of criterion content to sets of cues. Importantly, these cues are not necessarily perceptual – they may include feedback from the motor system, feedback from the decision-making process, and strategic knowledge of the task. In this regard, CST differs from other multifeatural conceptions of awareness that only consider perceptual information at different levels of an assumed processing hierarchy (Fazekas & Overgaard, 2018; Kouider et al., 2010). The most important property of the dissociation paradigm according to CST is that the indirect task implies a critical feature that anchors the critical cue – the critical aspect of perceptual experience that the direct measures are supposed to capture. Direct measures whose criterion contents do not contain this critical cue are often valuable and informative in their own right but provide no foundation for establishing a dissociation with the indirect measure. They essentially try to compare their own apples with the indirect measure's oranges (Erdelyi, 1986). Likewise, unspecific measures that do not focus explicitly on the critical cue, like the PAS, do not provide a solid basis to argue for dissociation from the indirect measure. Fortunately, it is often straightforward to identify the critical feature and to construct direct measures targeting it, both objective and subjective ones.

CST explicitly acknowledges that criterion content may vary between observers – perhaps because of idiosyncratic differences in their perceptual systems (e.g., in the time course of visual masking; Albrecht & Mattler, 2016), because of different strategies in forming the criterion content, but also because of different ways of integrating the available cues (Bernstein et al., 1976; Jannati & DiLollo, 2012; Ventura, 1980). Of course, such idiosyncrasy complicates measurement as well as the interpretation of measures. One course of action is to use direct measures that explicitly ask for a particular content, and to train observers to report only on that content. In our opinion, it is furthermore essential to set up experiments in such a way that individual data patterns can be evaluated reliably. This is why we prefer a small number of trained observers performing many trials (generally, several sessions) to a large group of observers performing only few trials. We are therefore advocating the psychophysical measurement standard now discussed under the label “small-N design” (Smith & Little, 2018; see Arend & Schäfer, 2019, and Baker et al., 2021, for demonstrations and easy calculations of adequate statistical power in such designs). Note that it is never advisable to average across observers with qualitatively different data patterns.Footnote 4

CST also provides a new justification for employing different direct measures in the same paradigm. Once there is at least one direct measure that is reasonably valid in capturing the critical cue, other measures can focus on different facets of measurement. Because these additional measures do not need to utilize the critical cue, their criterion contents are free to give separate information about the observer's performance or experience. And indeed, if no measure at all is singled out to anchor the critical feature, all that remains are multiple facets of measurement that can be compared for their properties. For example, Zehetleitner and Rausch (2013) show that ratings of decision accuracy can outperform stimulus ratings. Koster et al. (2020) show that performance in the objective discrimination task can be low even in the presence of rich subjective perception of other aspects of the critical stimulus, and that those subjective cues can be dissociated among each other. Similarly, Maniscalco, Peters, and Lau (2016) demonstrate that double dissociations can occur between two direct measures (an objective measure of performance, d’, and a subjective measure of confidence, meta-d’) and how such a dissociation is predicted by signal detection theory.

Giving up the distinction between direct and indirect measures takes care of another fundamental puzzle in the history of consciousness research: What is used as a direct measure of awareness in one study may be used as an indirect measure of unconscious processing in another (Timmermans & Cleeremans, 2015). For instance, Peirce and Jastrow (1885) argued that their participants could successfully discriminate between two objects even though they indicated that they had no confidence in their decisions. Similarly, Sidis (1898) tried to create stimulus conditions such that participants reported being unable to detect the stimuli and yet showed some ability to discriminate between them, and concluded that discrimination was based on unconscious perception (a recent paper from Stein and Peelen, 2021, uses the same argument). From the viewpoint of the subsequent measurement tradition in unconscious perception, those authors used measures that were all indicators of visual awareness, and therefore those demonstrations of dissociations among commonplace awareness measures may appear unsettling. From the perspective of CST, however, there is no contradiction because different direct measures are allowed to be based on different criterion contents, which in turn allows for dissociations between different facets of awareness.

Can the CST perspective clarify other difficulties arising from a unitary concept of awareness? One of our own studies is a good case in point. Schmidt et al. (2010) used a visual illusion to study possible dissociations between the appearance of differently shadowed surfaces (one appearing lighter in surface color than the other because of the illusion; see Adelson, 2000) and the sign of the response priming effect evoked by those surfaces in a flanker task. By comparing different illusion conditions, we showed that response priming and lightness judgments could contradict each other because lightness matches were always following the illusion while response priming simply depended on the local contrast of the surfaces. We interpreted those finding in terms of conscious visual perception (of surface lightness) versus unconscious response activation (by local contrast). However, instead of performing lightness matches of the critical stimuli, participants could just as well have been asked to directly judge their local contrast. Because it is trivially easy to say whether a patch of grey is brighter or darker than its immediate background, those judgments would have been highly accurate and closely associated with the response priming effect. This means that our conclusion was exaggerated: instead of claiming a dissociation between response priming and visual awareness in general, we could have (1) diagnosed the double dissociation between awareness for contrast and awareness for lightness, and (2) concluded that response priming was associated with awareness for contrast but dissociated from awareness for lightness. Simply acknowledging the possibility of different direct measures with nonidentical criterion contents would have greatly simplified the interpretation of those findings.

From such considerations, Koster et al. (2020) draw the following conclusions:

“[…] subjective experience has to be conceived as a multidimensional pattern of experiences. It is important to note that this finding casts doubt on all attempts to measure visual awareness in a single univariate measure because some other aspects of visual experience might always vary in opposite ways across a given parameter such as SOA. In consequence, the idea of an exhaustive measure or a gold standard for measuring consciousness appears simplistic.”(p. 20)

The philosopher Elizabeth Irvine (2017) comes to a similar conclusion. In a paper entitled “Explaining What?”, she distinguishes between the concepts of “Konsciousness” (with a capital “K”) and “schmonciousness.” Believers in Konsciousness have a hard, monolithic concept of what they want to explain, “a single, coherent and unitary explanatory target” that may find explanation in a single sweeping theory. Believers in “schmonciousness,” on the other hand, have a much more modest concept: they assume that the term “consciousness” is still volatile and maybe even prescientific, and that it may disintegrate into the study of many more specific aspects:

“Rather than keep trying (and failing) to identify which state or process consciousness really is, the idea is to accept the fragmentation […]. […T]rying to explain consciousness with a single materialist blow is just as confused as trying to explain intelligence […], health or happiness by pointing to a single mechanism, gene, or causal factor.” (p. 9)

CST leads to a similar conclusion. Our Propositions 1–3 state that theories explaining the entirety of consciousness out of a single monotonic process are falsified by double dissociations among measures of awareness. This leads to a simple demand that can be placed on any theory aiming to explain consciousness. We call this demand "Explaining dissociated gradients" (cf. Schmidt & Biafora, 2022). We define a gradient as a set of measures responding to specific changes in experimental conditions; for example, Fig. 2 shows a gradient that consists of a small set of direct measures in response to a variation in prime-target SOA. When experimental conditions are varied, the gradient traced out by this process may be surprisingly complex: some measures may increase with parameter changes, others decrease; some may be u-shaped, others invariant; some may respond to some experimental variations but not others. Convincing theories of visual awareness should aim at explaining such gradients, at least for some direct measures and some experimental conditions at a time (see Doerig et al., 2020, for further criteria that could be applied to such theories).

Explaining dissociated gradients requires a theory that is sufficiently specific about the facets of awareness involved (Klein & Hohwy, 2015). If visual awareness consists in a multidimensional pattern of dissociable cues, each of those cues requires sophisticated measurement, explanation, and theorizing. A theory trying to explain the simultaneous experience of color and motion in a masked stimulus must therefore involve a theory of color and motion before it can begin to explain why these impressions are conscious. Such a theory of consciousness is not in sight. The most prominent current theories attempt to explain consciousness out of a single process: For instance, Global Workspace Theory postulates a widespread "ignition" in neural activity (Baars, 1993, 2013, Dehaene & Naccache, 2001), and Integrated Information Theory postulates that consciousness is a consequence of the amount of "integrated information", Φ (Tononi & Edelman, 1998, Tononi, 2004, but see Oizumi et al., 2014, for a formulation of the theory that seems to allow for multiple Φi). These theories have in common that their explanatory process is strictly unidimensional. Because they are trying to specify the neural correlate of a unitary process of consciousness, they are not able to explain why one facet of visual awareness increases while another one decreases: they fail to "explain the gradient" of the experiment. Ultimately it is the dynamics of those facets that need to be explained by any fully developed theory of visual awareness. We hope that CST will help transform the field of consciousness research to a detailed, sophisticated study of task dissociations among direct and indirect measures, not merely by "accepting the fragmentation" (Irvine, 2017), but by appreciating the fascinating kaleidoscopic nature of conscious and unconscious vision.