Introduction

Perceiving our visual world seems effortless (Hoffman, 1998), yet the contents of perception are impacted by complex selection processes that bring preferred, important, and/or salient items to the forefront of visual experience. Put another way, visual attention can change what we see even as the scene before us remains unchanged, by selecting parts of that scene for enhanced processing (Egeth & Yantis, 1997; Yantis, 1998; Carrasco, Ling, & Read, 2004). Bistable perceptual phenomena, in which an observer’s visual experience periodically alternates between competing percepts despite unchanging sensory input, provide a unique opportunity to examine attention’s role in determining what we see. In particular, those phenomena allow us to address an important question: are endogenously generated changes in perceptual experience (such as those that characterize bistable perception) necessarily driven by visual attention mechanisms? While attention is often defined as a selective mechanism that chooses among competing alternatives, what remains unclear is the extent of its role in selecting between conflicting monocular inputs (e.g., in binocular rivalry [BR]) or perceptual interpretations (e.g., forms of bistable stimuli, such as the Necker Cube, Rubin’s face/vase, ambiguous structure-from-motion). With regard to both BR and bistable perception in general, the full range of attention-related hypotheses has been proposed. These span conceptualizations based on adaptation and inhibition without any mention of an involvement of attention (Matsuoka, 1984; Wilson, 2007) to ones that posit an essential role for attention in bistable perception, either for driving switches between interpretations or even for allowing either interpretation to prevail in the first place (Brascamp & Blake, 2012; Helmholtz, 1925; Ooi & He, 1999; Walker, 1978; Zhang, Jamison, Engel, He, & He, 2011) Indeed, various authors (Kanai, Bahrami, & Rees, 2010; Knapen, Brascamp, Pearson, van Ee, & Blake, 2011; Lumer, Friston, & Rees, 1998; Slotnick & Yantis, 2005; Zaretskaya, Thielscher, Logothetis, & Bartels, 2010) have pointed to a correspondence in neuroanatomical terms between brain areas involved in bistable perception and those implicated in attention and attention shifts (Corbetta, Patel, & Shulman, 2008; Yantis et al., 2002). Determining visual attention’s actual role during bistable perception could help us understand the mechanisms that resolve uncertainty among competing perceptual interpretations that are widely believed to arise during routine, everyday vision (Geisler, 2011; Hohwy, 2012). In addition, both BR and other forms of perceptual bistability are thought to involve interactive processing across multiple levels of the visual hierarchy (Blake & Logothetis, 2002; Long & Toppino, 2004), suggesting that a better understanding of how attention influences these phenomena will reveal insights about how attention coordinates visual activity to create a coherent visual experience (Serences & Yantis, 2006). For these reasons, this essay seeks to assemble and interpret evidence bearing on the question of attention and perceptual bistability.

To start, then, what can be said about processes responsible for BR and other forms of bistable perception? Considering first BR, several lines of evidence suggest that BR exhibits notable dependency on neural events transpiring within early stages of visual processing (Blake, Tadin, Sobel, Raissian, & Chong, 2006; Ooi & He, 2003; Tong & Engel, 2001; Wunderlich, Schneider, & Kastner, 2005) where conflicting monocular information conveyed by the two eyes first competes within binocular mechanisms. At the same time, there is also evidence that the dynamics of BR depend crucially on relatively ‘high-level’ processes embodying information about the affective content of the competing stimuli and on the expectations and intentions of the observer viewing those stimuli (see review by Blake, 2014). Also in support of the involvement of high-level processes is evidence that a stronger relationship between rivalry and neural activity emerges in later visual areas (Leopold & Logothetis, 1996; Logothetis & Schall, 1989; Sheinberg & Logothetis, 1997). Furthermore, other forms of perceptual bistability do not involve competition between information presented separately to the two eyes, yet they exhibit perceptual dynamics that bear striking resemblance to those characteristic of BR (Brascamp, Klink, & Levelt, 2015; Carter & Pettigrew, 2003). Is it reasonable, therefore, to conclude that all forms of bistability—BR and other forms—might be linked by an attentional process whereby one stimulus representation is strengthened and/or the winning representation is registered within higher visual areas?

There are two distinct (and non-exclusive) ways in which attention may affect bistable perception. First, attention may act as a modulatory influence that alters the dynamics of bistable perception. This manifests, for example, as changes in the rate of perceptual alternation (Alais et al., 2010; Kohler et al., 2008; Kornmeier, Hein, & Bach, 2009; Lack, 1978; Paffen et al., 2006; Pastukhov & Braun, 2007; Reisberg & O'Shaughnessy, 1984; Schölvinck & Rees, 2009; Stonkute et al., 2012; Suzuki & Grabowecky, 2007), or as a bias in perception in favor of an attended perspective (Chong, Tadin, & Blake, 2005; Dieter, Melnick, & Tadin, 2015; Hol, Koene, & van Ee, 2003; Meng & Tong, 2004; Mitchell, Stoner, & Reynolds, 2004; Ooi & He, 1999; Suzuki & Peterson, 2000; Toppino, 2003). These modulatory influences of attention on bistable perception have been reviewed previously (Dieter & Tadin, 2011; Paffen & Alais, 2011), and we return to some relevant points later in our essay. The primary focus of this essay, however, is the possibility that attention plays an even more fundamental role in bistable perception—namely, that attention actually promotes the neural states that underlie competing perceptual interpretations.

To deduce whether attention plays an essential role in the process of perceptual bistability, we frame the issue in the form of a question: do alternations between competing neural representations persist when one does not attend to a bistable stimulus? If alternations cease during periods of complete inattention, this would constitute evidence that attention is required to drive the typical dynamics of perceptual bistability. On the other hand, evidence indicating the persistence of alternations outside of attention (even at an altered rate) would indicate that a key aspect of perceptual bistability can transpire without attention. This would further suggest that mechanisms separate from attention are sufficient to give rise to the dynamics of bistability.

If evidence reveals that alternations in perceptual bistability cease outside of visual attention, one would then like to know the answer to a second question: what scenario replaces the typical alternation cycle when bistable stimuli are unattended? Two likely scenarios exist. First, the competition between the two alternatives might not be resolved at all, with the visual system remaining in a type of “mixture” state where both possible perspectives are equally dominant at the same time. Under a second scenario one alternative does win out, but the system never switches to the competing alternative (i.e., “winner-take-all”). The latter outcome would be reminiscent of the persistent dominance experienced when bistable images are presented intermittently (Pearson & Brascamp, 2008; but note that even these intermittently presented stimuli alternate eventually, Brascamp, Pearson, Blake, & van den Berg, 2009).

From the outset, we want to note that the possibility of bistability outside of attention is, in principle, separate from the question of whether participants can subjectively report perceptual switches under inattention. Indeed, a central challenge when investigating this possibility is that a person not attending to a bistable stimulus often cannot report his/her subjective perceptual experience of that stimulus. This is expected given the known impact of inattention on perception of even salient stimuli (Resnick, O’Regan, & Clark, 1997; Simons & Chabris, 1999). So, investigators must turn to other methods in order to assess the response of the visual system to an unattended bistable stimulus. For instance, some researchers have measured patterns of neural activity during the viewing of unattended bistability, while others have inferred the response during a period of inattention from the perceptual experience that ensues after the period of inattention has ended. Taken together, the evidence accumulated so far (and reviewed here) reveals a dissociation of BR from other forms of perceptual bistability—while alternations during BR are abolished in the absence of attention, other forms of bistability continue to fluctuate (perhaps at a slower rate). Several of the studies that investigated BR further provide some initial answers regarding whether inattention leads to a winner-take-all situation (with unending dominance of one eye’s image) or, alternatively, a situation where the conflict is not resolved in favor of either alternative. Added to evidence that attention’s modulatory effect on BR is dissociable from that on other forms of perceptual bistability (Dieter & Tadin, 2011), this pattern further supports the notion that the mechanisms underlying these outwardly similar phenomena are at least partially independent.

For the reasons previewed above and explained in detail in the remainder of this essay, we discuss studies that used BR stimuli separately (in the next section) from those that involved other forms of bistable stimuli (in the section following our discussion of BR).

The fate of unattended binocular rivalry

BR occurs when incompatible images in the left and right eye (represented independently at the earliest stages of the visual system) come together, as the visual system converts from a monocular to a binocular representation of the visual world. Because binocular correspondence cannot be established between these monocular images, they instead engage in ongoing perceptual competition, whereby each image alternately dominates visual awareness for several seconds at a time (e.g., see reviews by Alais, 2012; Blake, 1989). It is conceivable that the pattern of stochastic perceptual alternations observed during BR can be produced in the absence of visual attention—indeed, many neural models of the phenomenon produce such alternations through mutual inhibition, adaptation, and neural noise, without reference to attention (Laing & Chow, 2002; Lehky & Blake, 1991; Shpiro, Moreno-Bote, Rubin, & Rinzel, 2009). However, even if these models accurately represent the neural processes underlying BR this does not preclude a role for visual attention, as attention could interact with any of those modeled components and/or play a role in the readout of the winning neural signal. As a result, empirical studies in which attention is diverted from rival stimulation are essential to establishing whether BR alternations are independent of attention.

Two studies have provided evidence that viewing BR stimuli as part of a dual-task paradigm leads to a slowing of alternations. Specifically, reporting one’s percept during BR concurrently with either a peripheral visual (Paffen et al., 2006) or an auditory (Alais et al., 2010) distractor task occasions a reduction in rivalry rate, suggesting that switching is tied to the strength of visual attention. However, because some degree of attention was always directed to the BR stimuli, it is not clear whether attention’s contribution to perceptual alternations is merely modulatory or, in fact, essential for switches to occur. As such, we note a fundamental distinction between such dual-task (“partial attention”) approaches, and those in which the observer need not report on the bistable stimulus (“complete inattention”) (Fig. 1).

Fig. 1
figure 1

Impact of inattention on binocular rivalry and bistable perception. Alternations during bistable perception under conditions of inattention can be classified into three categories: typical (i.e., same dynamics as when these stimuli are attended), slowed (i.e., reduced alternation rate relative to when these stimuli are attended), or none (i.e., alternations between percepts cease). We first classified studies by methodology, labeling studies as involving partial attention to rival stimuli if they utilized a dual-task approach in which observers continued to report their percept while also completing a distracting attentional task. Studies in which observers did not report their perceptual state during unattended periods were classified as involving complete inattention to perceptual bistability. Though there is a degree of subjectivity in this figure, our characterization (above) generally matches the conclusion promoted by the authors of the empirical finding in question (see further discussion of these papers in main text). Reference 13 is plotted in between “slowed” and “typical” as those results demonstrate that alternations must have occurred outside attention, but do not provide a direct measurement of alternation rate (see main text). Some findings that are discussed in the main text are not included in this figure, as their implications on alternation rate are less clear. Partial attention, BR: 1Paffen, Alais, & Verstraten, 2006; 2Alais, van Boxtel, Parker, & van Ee, 2010. Partial attention, bistable stimuli: 3Pastukhov & Braun, 2007; 4Stonkute, Braun, & Pastukhov, 2012; 5Schölvinck & Rees, 2009; 6Reisberg & O'Shaughnessy, 1984; 7Kohler, Haddad, Singer, & Muckli, 2008; 8Intaite, Koivisto, & Revonsuo, 2012. Complete inattention, BR: 9Zhang et al. 2011; 10Brascamp & Blake, 2012; 11Cavanagh & Holcombe, 2006 12Leopold, Fitzgibbons, & Logothetis, 1995. Complete inattention, bistable stimuli: 3Pastukhov & Braun, 2007; 13Mareschal & Clifford, 2012; 14Dieter, Tadin, & Pearson, 2015

To study the dynamics of BR in the complete absence of attention (i.e., under conditions that preclude explicit reports of stimulus state) alternative approaches are needed (see Tsuchiya, Wilke, Frassle, & Lamme, 2015). One such alternative is to measure patterns of neural activity during unattended BR and use these to determine whether alternations in rivalry states occur in attention’s absence. In a recent study utilizing this approach, Zhang and colleagues (2011) had observers view rival stimuli for extended periods (30 s), with each stimulus flickered rapidly on and off with its own unique temporal frequency (Fig. 2a). During one condition (unattended rivalry) observers did not report their perceptual experience, instead devoting all attention to a demanding visual task at fixation. In another condition (attended rivalry), that task was not required and, instead, observers reported their experienced fluctuations in rivalry while ignoring the fixation task. Regardless of the observers’ task, the unique frequency tag of each eye’s stimulus drove dissociable neural signals that could be identified in the spectral profile of the electroencephalogram (EEG) recordings measured over the occipital lobe (Fig. 2b). Zhang et al. discovered that while attended BR results in anti-correlated power fluctuations between the two frequency bands corresponding to the two eyes’ tags (i.e., the signal related to one eye is strong while the other is weak; Fig 2b, top), no reliable relationship between the eyes’ signals was produced during unattended BR (Fig. 2b, bottom). From this result and additional control experiments, they inferred that the typically observed alternating periods of dominance and suppression during BR had ceased in the absence of visual attention. Furthermore, they found evidence that during inattention, response amplitudes were stronger at intermodulation frequencies (representing a combination of the left- and right-eyes’ stimulation frequencies). These so-called distortion signals suggest that rival stimuli actually form a combined (i.e., fused) neural representation when unattended. With EEG, of course, one cannot conclusively pinpoint in which specific visual areas this attentional effect originates.

Fig. 2
figure 2

EEG signatures of attended and unattended BR. a In an experiment by Zhang and colleagues (2011), observers viewed incompatible left eye and right eye images while sometimes (“Unattended Rivalry” condition) performing a demanding feature conjunction task at fixation. Even in conditions where they did not report their rivalry percept, unique frequency signatures could be decoded from the EEG signal for each eye’s image. b When rivalry was attended, the left-eye and right-eye signals fluctuated in an anti-correlated manner, which also temporally aligned with participants’ perceptual reports (shaded red/green background). However, when rivalry was unattended, there appeared to be no relationship between fluctuations of the left-eye- and right-eye-related signals, suggesting that alternations had ceased. Figure from Zhang, Jamison, Engel, He, & He, 2011; adapted with permission from Elsevier

A similar approach was used to study perceptual transitions during unattended BR using functional magnetic resonance imaging (fMRI). The perceptual experience of switching from one stimulus to the other has been described as a “traveling wave,” beginning when one portion of a stimulus flips from the left eye’s view to right eye’s view (for example), and then propagating smoothly across the rest of the stimulus (Wilson, Blake, & Lee, 2001). When the left and right eyes are presented images differing greatly in contrast, this perceptual experience is accompanied by a reliable neural correlate in early visual areas, such that perception of the low-contrast image is associated with a weaker fMRI signal. In fMRI work that capitalized on these properties of binocular rivalry, Lee and colleagues (Lee, Blake & Heeger, 2005; Lee, Blake & Heeger, 2007) waited for complete predominance of the high contrast image, and then briefly increased contrast of a small portion of the perceptually suppressed low contrast image. This reliably caused the “triggered” portion of the low contrast image to achieve perceptual dominance, then instigating a traveling wave of dominance that slowly propagated across the rest of the stimulus. When attention was directed at the rival stimuli during this procedure, changes in blood oxygen-level dependent (BOLD) response in retinotopic cortex tracked the perceptual alternation from a high to low contrast image across the contiguous stimulus (Lee, Blake, & Heeger, 2005). With attention diverted from this stimulus by concurrent performance of a demanding task at fixation, this neural signature remained in V1 but could no longer be detected in V2, while it appeared to reverse in V3 (Lee, Blake, & Heeger, 2007).Footnote 1 So, while a neural marker of perceptual switches persists under inattention conditions in V1, diverting attention seemingly changes how later visual areas respond to BR stimulation. The notion of a preserved marker of switches in V1 is consistent with recent optical imaging work in monkeys, showing that an alternating pattern of V1 activity in response to rivalry stimulation remains even under general anesthesia (Xu et al., 2016). In potentially related work, Roeber, Veser, Schröger, and O’Shea (2011) measured event-related potentials (ERPs) and discovered a difference in response to rivalrous stimuli compared with non-rivalrous stimuli; a difference that remained even if the stimuli were unattended (also see Katyal, Engel, He, & He, 2016). This may suggest similar treatment of attended and unattended rival stimuli by the visual system (i.e., limited impact of inattention on rivalry); however, it does not necessarily imply that rivalry alternations continue. Indeed, as demonstrated by the findings of Lee, Blake, & Heeger (2007), one may find both signatures of BR that survive inattention, along with some that are disrupted. As none of these studies distinguish exactly to what extent the normal rivalry process remains, they are not classified in Fig. 1.

Instead of measuring neural activity during periods of unattended BR, another method is to look at the consequences of unattended BR on the subsequent perceptual experience of attended BR. This method is particularly useful in cases where perception changes reliably over time, so that perceptual dominance can be predicted even when not observed directly. Flash suppression provides such a case—here, an image is presented monocularly for a second or so, followed by the onset of a rival stimulus viewed by the other eye (Wolfe, 1983). This sequence of dichoptic stimulation produces reliable dominance of the second eye’s image (and suppression of the first) at its onset (i.e., at the “flash”). Critically, if the perceptual back and forth characteristic of rivalry dynamics ensues after the onset of the second image, the initially dominant image is likely to become suppressed very shortly after the “flash,” and then to gain dominance again a few seconds after that. Brascamp & Blake (2012) found this predicted data pattern when, following flash suppression, the observer simply continued to attend to and track the perception of the rival stimuli. Specifically, averaged across repetitions, the probability of perceiving the “flashed” eye’s image was initially high, then low, and then high again as time progressed following the flash (Fig. 3a). However, if the observer instead devoted attention exclusively to a different task for a brief period right after the flash and then switched attention back to BR to report perception, there was no such temporal signature of the alternation cycle (Fig. 3b). Instead, both images were equally likely to be reported dominant regardless of the time relative to the “flash,” as would be expected if rivalry suppression does not continue during inattention. Consistent with this idea, perceptual reports in this second condition were indistinguishable from those in a third condition where a brief period of stimulus absence replaced the period of inattention (Fig. 3c). In other words, these results show that rivalry following a period of inattention mimics the onset of typical BR dynamics after stimulus absence, and suggest that no rivalry suppression occurs during a period when BR is unattended (Brascamp & Blake, 2012). Related preliminary results were obtained by Cavanagh and Holcombe (2006) using a paradigm in which attention was rapidly cycled among multiple rival targets. They found that perception froze (i.e., alternations ceased) at unattended locations, with the same percept remaining dominant 90% of the time—mimicking the perceptual experience when a rival image is periodically removed rather than periodically ignored (Pearson & Brascamp, 2008). Interestingly, fMRI studies have also failed to find a neural correlate of perceptual suppression during unattended flash suppression (Moradi & Heeger, 2009) or its more potent analog continuous flash suppression (Watanabe et al., 2011). In fact, the latter study demonstrated that withdrawal of attention, but not addition of a suppressor prompted a reduction in V1 BOLD activity (but see Yuval-Greenberg & Heeger, 2013 and Xu et al., 2016).

Fig. 3
figure 3

Perceptual impact of inattention on subsequently attended BR. In an experiment by Brascamp & Blake (2012), observers viewed flash suppression—a variant of binocular rivalry known to produce several seconds of predictable perceptual predominance (Wolfe, 1983). a As expected, when observers attended to rivalry, they typically perceived the image first “forced” into dominance, followed by a reliable switch to the other image (dip below dashed line). b However, when rivalry was unattended for a brief period immediately following flash suppression, this reliable signature was erased. c The pattern following a period of inattention matched that produced when flash suppression was followed by a brief period of stimulus absence. Together, this pattern of results strongly suggests that rivalry ceased during the period of inattention. Figure from Brascamp & Blake, (2012); adapted with permission from Sage Publications

Another behavioral method was employed by Ling and Blake (2012), who assessed the impact of inattention on interocular suppression using negative afterimages—the vivid, luminance inverted percepts that remain following the removal of visual stimulation. When the inducing stimulus is rendered invisible by adding a suppressing stimulus, the strength of the resulting afterimage is reduced (Brascamp, van Boxtel, Knapen, & Blake, 2010) owing to dampened processing of the suppressed item. However, under inattention conditions the impact of adding a suppressing stimulus to the inducing image was eliminated (Ling & Blake, 2012; i.e., afterimages were full strength). This result suggests a reduction of suppression strength from that experienced during attended rivalry, though the authors do not conclude that this necessarily indicates an abolishment of interocular suppression.

In addition to neural measures and subsequent perceptual consequences (as used in the studies described above), one may also try to study binocular rivalry outside of attention by finding physiological metrics that reliably predict rivalry predominance. As one example, increases in pupil size regularly precede rivalry alternations (Einhauser, Stout, Koch, & Carter, 2008) and, thus, could potentially be used in the future to study the rivalry process outside of attention. One preliminary investigation by Leopold et al. (1995) used a different metric, optokinetic nystagmus (brief automatic ocular following responses to moving stimuli), to try to decode motion rivalry predominance while an observer’s attention was distracted by a peripheral visual or auditory task. They found that the pattern of eye movements recorded during attended rivalry reliably predicted image predominance, and notably, that this pattern of eye movements remained largely unchanged during unattended binocular rivalry—if anything, they noted that the rate of rivalry alternations may have accelerated outside of attention. However, these preliminary results reflect the perception of just one observer in the unattended condition, and it is therefore hard to know how they may generalize. In addition, under at least some circumstances oculomotor processing of moving stimuli can be dissociated from their conscious perception (Glasser & Tadin, 2014), making straightforward interpretation of Leopold et al. (1995) results more difficult.

To summarize, the convergence of evidence across these diverse studies strongly suggests that BR is fundamentally altered by complete inattention (Fig. 1). Although direct observation of the perceptual dynamics of unattended BR is impossible under such conditions, results indicate that the expected neural and perceptual consequences of typical rivalry dynamics are not observed when attention is diverted from BR (though some neural signatures unique to dichoptic stimulation remain, especially in V1). Exactly what does happen to neural representations of BR stimuli during periods of inattention remains an open question, but several findings suggest that the answer may lie closer to a situation where both eyes’ images receive a comparable degree of processing (called “mixture” in our Introduction) than to a situation where one of the images dominates indefinitely (called “winner-take-all” in our Introduction). For instance, the findings of Zhang et al. (2011) are consistent with fusion of unattended rival stimuli, and Brascamp and Blake (2012) report results matching those when rivalry was removed from view. Moradi & Heeger (2009) similarly found weak suppression from adding an opposite-eye suppressor (compared to adding a same-eye stimulus), with inattention a possible explanation for the deviation of their result from typical psychophysical (e.g., Nichols & Wilson, 2009) and neurophysiological (e.g., Sengpiel, Blakemore, & Harrad, 1995) studies.

Dissociation between binocular rivalry and other forms of perceptual bistability

Given that attention seems to be a critical mechanism in driving perceptual alternations, and perhaps even suppression itself, during BR, an obvious next question is whether this generalizes to other forms of perceptual bistability. BR has been linked to other forms of perceptual bistability because of the similarity in perceptual dynamics among these phenomena (Carter & Pettigrew, 2003). However, ample evidence suggests that modulatory impacts of visual attention dissociate BR from other forms of perceptual bistability, a pattern consistent with at least partial independence of the mechanisms underlying BR (Meng & Tong, 2004; see Dieter & Tadin, 2011 for review). The uniqueness of BR once again emerges when considering effects of inattention on the dynamics of other forms of perceptual bistability.

One commonly studied form of perceptual bistability is motion-induced blindness (MIB; Bonneh, Cooperman, & Sagi, 2001), a phenomenon in which a dynamic moving background periodically suppresses a salient but stationary target. The dynamics of target disappearances during MIB in some ways mirror the alternations observed during BR, and two studies that have investigated the impact of withdrawn attention on MIB have found both parallels and differences relative to BR. In a first study, akin to studies in which BR was monitored as part of a dual-task paradigm, observers reported target disappearances in a peripheral MIB display while simultaneously directing their attention to a demanding task involving stimulation at fixation. Results indicated that target disappearances became less frequent, and lasted for longer durations, as the load of the attentional task was increased (Schölvinck & Rees, 2009). In another condition of this experiment, target dots were presented both in the left and right halves of the MIB displays, with observers instructed to report a hue change in the target on just one side of the display. Here, perceptual disappearances were much more likely to be reported on the attended side. These findings suggest slowed dynamics outside of attention, a result that appears to generalize to other bistable figures when partially attended (Fig. 1; Kohler et al., 2008; Pastukhov & Braun, 2007; Reisberg & O’Shaughnessy, 1984; Stonkute et al., 2012; but see Intaite et al., 2012). Given that slowed dynamics are also observed when BR is tracked as part of a dual-task paradigm (Alais et al., 2010; Paffen et al., 2006), this pattern of results suggests that slowed alternations under conditions of partially diverted attention are a general property shared by BR with other forms of perceptual bistability (Fig. 1).

However, this similarity between BR and other forms of perceptual bistability does not seem to extend to cases where observers’ attention is fully withdrawn from the bistable stimulus, making it impossible to monitor and report perception during bistability (Fig. 1). As outlined above, the convergence of evidence suggests that BR alternations, and perhaps suppression itself, cease in the absence of attention. Adapting the approach used by Brascamp and Blake (2012) to study BR during periods of inattention, Dieter, Tadin, and Pearson (2015) found that MIB continued to induce target disappearances even during periods of complete inattention. This study utilized a display in which both an MIB stimulus and a rapid serial visual presentation (RSVP) task were displayed, with observers instructed to switch attention between them. In one condition (inattention), observers first attended to the RSVP task before switching it to the MIB stimulus, resulting in an initial 3- to 5-s period of complete inattention to the MIB stimulus. Results indicated that on some trials, reaction times (RTs) to detect the target dot after switching attention back to the MIB display were slow (resulting in longer median RTs; Fig. 4a). This finding is consistent with occasional perceptual disappearance of the target dot immediately following the attention shift (Dieter, Tadin, et al., 2015), suggesting that target dot suppression had occurred while attention was still directed to the RSVP task. The authors further found no evidence that the frequency of target disappearances differed between this inattention condition and one in which the RSVP stimulus, while present, was ignored (Fig. 4b). This suggests that the dynamics of MIB were unaltered during periods of inattention. However, because this study employed discrete trials of MIB rather than extended viewing periods (akin to BR studies utilizing a flash suppression paradigm), it is unknown whether these results generalize to longer viewing times.

Fig. 4
figure 4

Inattention does not impact the dynamics of MIB. In an experiment by Dieter, Tadin, and Pearson (2015), observers viewed an MIB display for an initial period of 3-5 s, followed by a tone indicating that they should then press a key as soon as they saw the yellow target dot. On some trials, observers attended to the MIB display (“MIB only,” red) while on others they attended to a central RSVP task (“RSVP,” purple) during the initial period of 3-5 s. a Results indicated that reaction times (RTs) to detect the yellow target dot were slow (~900 ms) on both MIB only and RSVP “Test” trials (y-axis)—trials on which the target dot was physically present for the entire trial. These “Test” trial results were compared to “Off/On” trials. On these trials, the target dot was physically absent during the initial period, and was turned on coincident with the auditory response cue, resulting in faster RTs (x-axis). The observed difference between Test and On/Off conditions suggests that the dot occasionally disappeared during the initial 3- to 5-s period of “Test” trials, even when MIB was unattended. b Dieter et al. also estimated the proportion of trials on which the target dot disappeared, and found no difference between attended (MIB Only) and unattended (RSVP) trials, suggesting that MIB was unaltered by inattention. Figure adapted from Dieter, Tadin and Pearson (2015); Creative Commons license

In another investigation, the influence of inattention was tested on two other forms of perceptual bistability, moving plaids and structure-from-motion (Pastukhov & Braun, 2007). Here, observers completed a demanding task at fixation in which they monitored the global motion direction of rotating “dumbbells.” In one condition, observers performed this task while concurrently reporting the state of a surrounding bistable stimulus. Consistent with previously discussed findings, alternations between competing percepts occurred at a reduced rate during this dual-task (“partial attention”) condition. In another condition, the instructions changed so that observers rarely reported their bistable percept—only once every 14 s—resulting in relatively long periods of complete inattention toward the bistable images in between the attention shifts. The authors found that alternations still occurred in this condition and, critically, that some of these switches must have occurred in between the report periods (i.e., while attention was completely diverted from bistability). Thus for bistable plaids and depth-from-motion, like MIB, perceptual reversals seem to continue in the absence of visual attention.

A similar approach—periods of complete inattention intermixed with occasional perceptual reports—was used to investigate pairs of two ambiguous structure-from-motion stimuli (Mareschal & Clifford, 2012). A single such stimulus is perceived as rotating in depth, with the apparent rotation direction changing unpredictably over time. When two such displays are presented next to each other, however, they are frequently perceived as rotating in the same direction, even when brief stimulus manipulations at their onset “force” them to begin rotating in opposite directions. Interestingly, when observers diverted attention to a demanding counting task at fixation, this entrainment of perceived motion directions across the two stimuli remained (though to a significantly reduced extent). Because stimulus manipulations forced initial rotation to be in opposite directions for the two stimuli, this result implies that some direction switches must have occurred during the periods of inattention.

It is notable that current evidence indicates a marked difference between BR and other forms of perceptual bistability, with only BR being fundamentally affected outside of attention (Fig. 1). However, a factor to consider is that the methods used to achieve conditions of inattention vary widely across these studies. For one, several studies involving “complete inattention” achieved such conditions for only a few seconds, while bistable stimuli are often viewed for longer durations. In addition, we have already noted that studies utilizing a dual-task approach have found slowing of alternations across many forms of bistability including BR (Alais et al., 2010; Kohler et al., 2008; Paffen et al., 2006; Pastukhov & Braun, 2007; Reisberg & O'Shaughnessy, 1984; Schölvinck & Rees, 2009; Stonkute et al., 2012). Although there is a clear effect of attentional load in dual task conditions, one cannot be certain that attention was entirely diverted from the rival stimulus. Indeed, the act of reporting rivalry alternations itself seemingly involves visual attention (Brascamp, Blake, & Knapen, 2015; Frassle, Sommer, Jansen, Naber, & Einhauser, 2014; Knapen et al., 2011), making it critical that this aspect be removed from studies hoping to investigate bistability outside of attention. The distinction between BR and other forms of perceptual bistability arises when considering studies that did not require observers to report the state of the bistable stimulus during the period of inattention (Fig. 1).

The evidence seems to suggest that alternations during the viewing of bistable stimuli (other than those provoking BR) continue outside of attention (possibly at a reduced rate). While this essay groups these studies for the purpose of contrasting their effects with those of BR, there are likely mechanistic differences between these other individual forms of perceptual bistability as well, which may lead to unique impacts of attention across different bistable images. For example, MIB disappearances may result from an active “filling-in” process (Hsu, Yeh, & Kramer, 2006; New & Scholl, 2008), that would not pertain to other forms of perceptual bistability involving competing object interpretations. In addition, one notable feature common to most non-BR bistable perception studies reviewed here is the utilization of visual motion stimuli. Motion could be more resistant to complete disengagement of attention (i.e., harder to ignore). Future investigations should tease apart the idiosyncrasies of various individual forms of bistability, as well as the impact of factors such as motion on inattention.

Reconciling attention’s impact on binocular rivalry and on other forms of bistability

To reiterate two major points emerging from the evidence summarized in the previous sections: (1) the processing of stimuli driving BR is fundamentally affected under complete inattention and (2) attention, while essential for BR, is not required for the existence of perceptual fluctuations characteristic of other forms of bistability (Fig. 1). What conclusions are to be drawn from those facets of bistability? Let’s start by considering why bistability might persist in the absence of attention.

It is widely recognized that aspects of high-level mental processing can transpire without the metaphorical illumination provided by attention’s spotlight. For example, we have all had the experience of trying unsuccessfully to recall a person’s name or the title of movie, only to have that item subsequently pop into our mind while we’re no longer consciously attempting to recall the answer—these instances imply that the search for solutions to unsolved problems can continue without our explicit attention to the source of the problem. Construed in this context, then, the challenge of resolving ambiguous or conflicting sensory information may engage interpretative mechanisms whose activity persists even when we are ignoring (i.e., failing to attend to) the stimulus provoking the conflict. This idea comports well with the popular view positing that perceptual bistability reflects a form of probabilistic inference patently revealed in circumstances where visual input is ambiguous or conflicting (Hohwy, 2012). Indeed, the idea that perception entails unconscious inference has been a bed-rock notion dating back to Helmholtz (1925), and it continues to intrigue contemporary advocates (Leopold & Logothetis, 1999; Sterzer & Kleinschmidt, 2007). On this account, when unconscious inference fails to derive an unambiguous perceptual solution, instability persists even in the absence of attention.

But why should BR be an exception to this rule? Is there something unique about the perceptual response to visual conflict arising from dissimilar monocular stimulation compared to conflict associated with other forms of bistable perception? We think there could be. Consider some of the classic examples of visual stimuli that generate fluctuations in perception: some entail figures that support more than one object interpretation (e.g., duck/rabbit figure), others portray ambiguous three-dimensional (3D) perspectives (e.g., Necker cube), still others display conflicting figure/ground assignments (e.g., vase/face figure), and some simulate 3D objects whose depth-plane assignments are ambiguous (e.g., structure-from-motion). In these and other examples, it’s the perceptual interpretation of the visual object or event that fluctuates over time. Note, however, that the stimulus itself remains visible continuously, i.e., it does not undergo fluctuating periods of appearance and disappearance.Footnote 2 But such visibility fluctuations are the hallmark of BR: two dissimilar monocular stimuli compete for perceptual dominance, with the temporary loser vanquished from awareness for several seconds at a time. BR, in other words, seems to involve additional neural events in response to interocular conflict, events that do not accompany other forms of bistability. This distinction does not explain why the occurrence of BR, unlike those other forms of bistability, should depend crucially on attention, but this additional ingredient—fluctuating visibility—may be a useful clue in attempting to sort out this puzzle.

Another way in which BR is distinctive among bistable phenomena has to do with the unique way in which visual ambiguity arises. During ordinary viewing, the two eyes fixate the same object in 3D visual space, forming nearly identical images of that object centered on the foveae of the two eyes. If that fixated object happens to be an ambiguous figure, bistability ensues, but there is no disagreement between the eyes about what’s being viewed. However, monocular disagreement is precisely what instigates BR—in the laboratory dissimilar stimuli are purposefully imaged on corresponding retinal areas, which quite often are the two foveae themselves. Now, it is true that stimulus conditions associated with BR are also present during natural viewing (Arnold, 2011; Blake & Camisa, 1978; O’Shea, 2011), but those arise from objects situated off of the plane of the horopter—the imaginary curved surface in visual space (referenced to the fixation point) defined by 3D locations where visual elements cast images on corresponding locations in the two retinae and, thus, form a fused, binocular image.Footnote 3 But any stimulus falling outside of this narrow slice of space will cast images on non-corresponding areas of the two retinae, producing what is termed diplopic stimulation and creating exactly the conditions sufficient for rivalry: stimulation of corresponding retinal points by incompatible monocular inputs. So, why don’t we routinely experience BR, especially during longer fixations that should be sufficient for rivalry alternations to commence? Some of the diplopic images involve conflict between high-salience stimulation in one eye pitted against a featureless background imaged on the same retinal area of the other eye—this form of dichoptic stimulation would strongly favor dominance of the high-salience image (Ooi & He, 2006), making alternations exceedingly rare. In other portions of the 3D visual field, however, the dissimilar stimulation may differ only in feature content, not salience. Perhaps here is where the failure of attention comes into play: our attention is nearly always overtly focused on the contents of foveal vision (i.e., on whatever corresponds to our current point of fixation), not on other, non-fixated regions of the visual field where the stimulus conditions for rivalry may exist during natural viewing. Because the visual system searches for a solution to the correspondence problem that maximizes matched points across the global scene (Blake & Wilson, 1991), these locally mismatched patches may be treated as nothing more than noise in a matching process that otherwise finds a globally coherent solution. Of course, if you make the effort to attend to one of those regions while maintaining central fixation, you should—and will—be able to occasionally see rivalry transpiring (i.e., alternations between the eyes’ views, with occasional “mixed percepts” as in typical BR dynamics). Now, with these regions attended, they may begin producing an error signal, perhaps indicating a misalignment between the eyes. Indeed, to observe binocular rivalry using foveally viewed rival stimuli requires framing those stimuli with strong fusional locks that can overcome the intrinsic reflex to alter the vergence angle of the eyes.

One potential way this breakdown of BR outside of attention could arise is from attention’s role in perceptual grouping during BR. When BR is invoked by large stimuli, perception routinely switches at different times in different parts of the stimulus area, giving rise to an ever-changing, mosaic-like appearance (Meenes, 1930; Wilson et al., 2001). Yet at times, BR can be resolved simultaneously for multiple spatial zones across the stimulus area, suggesting that the incidence of global perceptual dominance relies on cooperative interactions across these zones (Alais & Blake, 1998; Blake, O’Shea, & Mueller, 1992; Kovacs, Papthomas, Yang, & Feher, 1996; Ooi & He, 2003). Comparable synchronization of dominance over space has also been reported when viewing multiple, ambiguous structure-from-motion animations (Freeman & Driver, 2006; Grossmann & Dobbins, 2003), with this coupling being dependent on attention (Mareschal & Clifford, 2012). Perhaps, then, when one experiences BR invoked by large stimuli, inattention does not disrupt BR as such but rather disrupts the linking of spatial zones—just as it may disrupt “surface filling-in” processes of perceptual organization (Poort et al., 2012). The resulting situation under conditions of inattention, a patchwork of dominance zones across the stimulus area, would seem compatible with the available evidence from studies that utilized large annuli (Lee et al., 2007; Moradi & Heeger, 2009; Zhang et al., 2011). Indeed, maintaining perceptual coherence by linking relevant neural activity is a key role for visual attention (Serences & Yantis, 2006). Notably, however, similar results are observed when small BR stimuli that likely do not span multiple “zones” are used (Brascamp & Blake, 2012), and the impact of inattention on suppression does not seem to depend on stimulus size (Ling & Blake, 2012). So, while attention’s role in maintaining perceptual coherence can explain results from studies with large rival stimuli, it seems attention may have an even more fundamental role as an essential mechanism of BR (see next section on neural models of binocular rivalry).

The fact that visual attention is required for BR also bears on the recent debate regarding the role of fronto-parietal brain regions during BR stimulation. These areas, which overlap with those typically associated with the control of visual attention, are more active during BR than during non-rivalrous “playback” conditions (Lumer et al., 1998). However, alternations between rival inputs can be inferred even under circumstances where these areas are minimally active (Brascamp, Blake, et al., 2015). This finding suggests that the activity of these regions may be tied to the perceptual decision and/or act of reporting a rivalry alternation rather than actually causing those alternations (for other bistable figures see Kornmeier & Bach, 2012). Such a pattern is consistent with a role of attention in reading out the stimulus that is currently “winning” the competition in earlier visual areas, as well as with a role in linking independent rival zones (as described above).

Incorporation of attention’s impact into neural models of binocular rivalry

Considerable effort has been directed at developing computational models of BR, and it is natural to ask whether these models may shed light on the questions addressed here. A popular category of models of BR aim to emulate the temporal dynamics of the alternation cycle in a dynamic system consisting of two components, thought of as the percepts’ representations (Wilson, 2007). These components interact through mutual inhibition and exhibit slow self-adaptation, prompting the name “adaptation-inhibition models.” Some implementations furthermore reserve a central role for other factors such as system noise (notably Moreno-Bote, Rinzel, & Rubin, 2007). Although attention is not typically considered as a factor, it is worth investigating how recent findings suggesting a requirement of visual attention for rivalry alternations might be framed in the context of these models.

First, this family of models is readily compatible with the finding that partial attention withdrawal can slow the alternation cycle. Withdrawing attention is sometimes thought of as reducing the effective stimulus contrast, and these models invariably predict that reductions in stimulus contrast slow the alternation cycle (at least in the high-contrast regime of most rivalry experiments; Shpiro, Curtu, Rinzel, & Rubin, 2007). The models are, in other words, consistent with the notion that slowing due to partial attention is a result of reduced effective stimulus contrast (Paffen et al., 2006).

Why complete inattention would preclude BR altogether, however, is less obvious from studying these models. In general, the models are quite sensitive to parameter settings, such that even modest changes to the parameters that control the strength of (for instance) the stimulus input, adaptation, or mutual inhibition can bring a model’s alternation cycle to a halt (Seely & Chow, 2011; Shpiro et al., 2007; Wilson, 2007). Thus, inattention, in theory, could abolish rivalry through a change in any one of the model parameters, perhaps corresponding to a change in neural responsivity or connectivity within the neural system being modeled. However, the empirical evidence for this line of reasoning is lacking. The BR models often have two distinct regimes in which rivalry alternations are abolished: either both representations are active simultaneously (similar to what we have called “mixture”), or one representation dominates the other indefinitely without any switches (what we have called “winner-take-all” behavior). Given the empirical findings that partial attention withdrawal slows BR alternations, and full attention withdrawal seemingly leads to a situation where neither percept is dominant over the other (Brascamp & Blake, 2012; Zhang et al., 2011), one would be looking for a model parameter that has the property that a small change in its value causes a slower alternation cycle and a large change leads to a mixture-like situation. In these models, however, all parameter changes that initially lead to slowing tend to lead to winner-take-all behavior at more extreme settings, rather than to mixture (Seely & Chow, 2011; Shpiro et al., 2007; Wilson, 2007). The mixture regimes of these models, on the other hand, tend to lie at the extreme end of ultra-fast alternation cycles, which is not readily reconciled with the available empirical data. In sum, while it is conceivable that attention acts as an external factor modulating the neural systems that these models aim to capture, existing models do not provide obvious handles for implementing this idea.

Given inattention’s dramatic effect on BR, it is reasonable to also consider models where attention is an intrinsic factor of the model architecture rather than merely an external modulator. One model that fits this description is the one by Ling & Blake (2012). Above we discussed these authors’ finding that adding a suppressing stimulus to the other eye would not impact the afterimage left by an inducing image (see section, “The fate of unattended binocular rivalry”). The authors actually predicted this finding using a model in which the difference in neural activity associated with the dominant and suppressed stimuli is partly caused by the dominant stimulus blocking attention to the suppressed stimulus. Based on this model, the authors argued that this difference might be negligible under inattention conditions (where attention is absent to begin with and cannot be blocked), and then identified stimulus conditions (i.e., stimulus size and contrast values) to test this prediction using the afterimage experiment discussed above. While the authors do not conclude that rivalry has been abolished in their inattention condition, merely attenuated in amplitude, their model appears to offer a promising route toward understanding the body of findings of BR under inattention. Another model that incorporates attention explicitly in the formulation of interocular suppression is one developed by Li, Carrasco, & Heeger (2016). That model incorporates both attentional modulation and divisive normalization to produce suppression of one eye’s stimulus, with distinct effects of those two processes. Normalization serves to modulate the contrast-gain associated with the rival stimuli, whereas attention is governed by a feature- and location-specific influence on the strength of neural signals prior to the implementation of normalization. Although they do not consider a situation where attention is withdrawn completely from both rival stimuli, the version of the model they favor includes parameters whose values, in principle, could achieve levels that would abate interocular suppression altogether.

Conclusions

We find that the convergence of evidence suggests that BR is fundamentally affected outside of visual attention, while alternations accompanying other forms of perceptual bistability continue in a relatively unaltered fashion. This dissociation complements the one previously described for modulatory effects of attention on perceptual bistability, with selective attention having a much more modest impact on BR than on other types of bistable figures (Dieter & Tadin, 2011). Indeed the current summary may offer an explanation of that pattern—if visual attention is already engaged in the process of simply driving states of dominance and suppression during rivalry, there may be few resources remaining for further modulatory impacts. Future research in this area will be necessary to determine the potential impacts on everyday natural vision, where we likely encounter unattended diplopic images with regularity.