The resonant brain: How attentive conscious seeing regulates action sequences that interact with attentive cognitive learning, recognition, and prediction
This article describes mechanistic links that exist in advanced brains between processes that regulate conscious attention, seeing, and knowing, and those that regulate looking and reaching. These mechanistic links arise from basic properties of brain design principles such as complementary computing, hierarchical resolution of uncertainty, and adaptive resonance. These principles require conscious states to mark perceptual and cognitive representations that are complete, context sensitive, and stable enough to control effective actions. Surface–shroud resonances support conscious seeing and action, whereas feature–category resonances support learning, recognition, and prediction of invariant object categories. Feedback interactions between cortical areas such as peristriate visual cortical areas V2, V3A, and V4, and the lateral intraparietal area (LIP) and inferior parietal sulcus (IPS) of the posterior parietal cortex (PPC) control sequences of saccadic eye movements that foveate salient features of attended objects and thereby drive invariant object category learning. Learned categories can, in turn, prime the objects and features that are attended and searched. These interactions coordinate processes of spatial and object attention, figure–ground separation, predictive remapping, invariant object category learning, and visual search. They create a foundation for learning to control motor-equivalent arm movement sequences, and for storing these sequences in cognitive working memories that can trigger the learning of cognitive plans with which to read out skilled movement sequences. Cognitive–emotional interactions that are regulated by reinforcement learning can then help to select the plans that control actions most likely to acquire valued goal objects in different situations. Many interdisciplinary psychological and neurobiological data about conscious and unconscious behaviors in normal individuals and clinical patients have been explained in terms of these concepts and mechanisms.
KeywordsSpatial attention Object attention Saccadic eye movement Arm movement Movement sequences Complementary computing Hierarchical resolution of uncertainty Adaptive resonance Consciousness Surface–shroud resonance Feature–category resonance Invariant object category learning Neon color spreading Boundary completion Surface filling-in Figure–ground separation Cognitive working memory Cognitive plan V2 V3A V4 LIP IPS PPC PFC
1. Introduction: How conscious resonant dynamics link perception and cognition to action
This article summarizes a radical departure from the classical view that sensory inputs are transformed via feedforward processes from perception to cognition to action, with little regard for processes of visual attention, memory, learning, decision-making, and interpersonal interaction. Instead, the article summarizes how feedback occurs ubiquitously in our brains to regulate processes of Consciousness, Learning, Expectation, Attention, Resonance, and Synchrony, the so-called CLEARS processes. The CLEARS processes are realized by building upon basic brain designs such as complementary computing, hierarchical resolution of uncertainty, and adaptive resonance that will be described below.
The brain processes that carry out complementary computing and hierarchical resolution of uncertainty clarify not only how and where conscious states of mind occur in advanced brains but also why evolution may have been led to discover conscious states of mind. In brief, conscious states are needed to control the choice of task-relevant actions. This article thus argues that a full understanding of links between cognition and action cannot be achieved without first understanding the fundamental mechanistic link that exists between conscious perceptual and cognitive representations and the choice of effective actions. The article will accordingly describe how a particular hierarchical resolution of uncertainty that occurs in the visual system enables conscious states to be activated that focus spatial attention upon object surfaces. The surface–shroud resonance that sustains spatial attention on an object surface also controls sequences of saccadic eye movements that foveate the object’s salient features. The scanned salient features, in turn, enable learning an invariant object category with which to recognize and predict the object. The foveated positions can also activate reaching movements with which to manipulate the object. Sequences of looking and reaching movements can be stored in working memory, thereby enabling learning of cognitive and motor plans whereby skilled sequential movements can be carried out. Cognitive–emotional interactions help to select the plans that are appropriate in different environments.
These goals cannot be achieved without first understanding how the CLEARS processes contribute to these goals. First and foremost, the CLEARS processes help to solve the stability–plasticity dilemma, whereby advanced brains can learn quickly without catastrophically forgetting already learned, but still useful, knowledge at unpredictable times. By solving the stability–plasticity dilemma, humans can rapidly learn enormous amounts of new information, on their own, throughout life, and can integrate all this information into unified conscious experiences that cohere into a sense of self.
Currently popular machine learning algorithms, such as back propagation and deep learning, do experience catastrophic forgetting, in addition to being unable to learn quickly or autonomously in response to a changing world in real time.
Adaptive resonance theory, or ART, solves the stability–plasticity dilemma by showing how the CLEARS processes work together to enable our brains to autonomously learn to attend, recognize, and predict objects and events in a changing world. ART was led to predict that “all conscious states are resonant states” as part of its specification of mechanistic links between the CLEARS processes. These mechanistic links explain data ranging from individual spikes and their synchronization to the dynamics of conscious and unconscious perceptual, cognitive, and cognitive–emotional experiences. ART currently provides unified explanations of much more interdisciplinary data in these areas than other available theories, and all the main ART hypotheses have been supported by subsequent experiments. See Grossberg (2013, 2017b, 2018) for recent expositions.
Feature–category resonances solve the stability–plasticity dilemma
A resonance is a dynamical state during which neuronal firings across a brain network are amplified and synchronized when they interact via reciprocal excitatory feedback signals during a matching process that occurs between bottom-up and top-down pathways. Such a resonance can trigger fast learning that incorporates the attended critical feature pattern into the LTM traces within the bottom-up adaptive filters that activate recognition categories, and the top-down expectations that are read out by them—hence the name adaptive resonance—while suppressing outliers that could have caused catastrophic forgetting, and thereby solving the stability–plasticity dilemma.
Object attention obeys the ART Matching Rule
The object attentional feedback that enables ART matching to occur obeys the ART Matching Rule, which was predicted to be realized by top-down, modulatory on-center, off-surround networks. Such networks can prime expected feature patterns with their top-down modulatory on-centers, while also inhibiting unexpected features via their off-surrounds. When these top-down selective attention circuits are embodied within the larger neural architectures that the current article describes, they provide a rigorous mechanistic interpretation of concepts like “action-centered attention” (Tipper, Lortie, & Baylis, 1992) and “affordance competition” (Cisek, 2007). Indeed, the affordance competition hypothesis uses the kind of recurrent on-center off-surround networks, also called recurrent competitive fields, from which ART Matching Rule circuits are constructed (Bullock, Cisek, & Grossberg, 1998; Cisek, Grossberg, & Bullock, 1998; Grossberg, 1973).
It has been discovered over the years that adaptive resonances generate parametric properties of individual conscious experiences of seeing, hearing, feeling, and knowing. ART has by now reached sufficient maturity to begin classifying the brain resonances that support conscious awareness during such experiences. Psychological and neurobiological data about conscious and unconscious experiences in both normal individuals and clinical patients have been clarified by this classification (e.g., Franklin & Grossberg, 2017; Grossberg, 2017a, 2017b, 2018; Grossberg & Kishnan, 2018; Grossberg, Palma, & Versace, 2015; Grossberg & Versace, 2008). This analysis also explains why not all resonances become conscious, and why not all brain dynamics are resonant, as discussed in Sections 3 and 4 below.
Sections 2 and 4 will summarize the fact that many advanced neocortical systems are organized into pairs of parallel processing streams that obey computationally complementary laws. The streams interact together using multiple processing stages to overcome the uncertainties that each stream, acting alone, would face. As noted above, such a hierarchical resolution of uncertainty clarifies why the evolutionary process was driven to discover conscious states upon which reliable actions could be based. Sections 5 and 6 will review complementary properties of perceptual/cognitive processes in the ventral “what” cortical stream and spatial/action processes in the dorsal “where” cortical stream. Sections 7 and 8 will describe the hierarchical resolutions of uncertainty that occur in the visual system in order to compute the boundary and surface representations that can be used for seeing, recognition, and action. Sections 9 and 10 summarize the surface–shroud resonances and feature–category resonances that build upon these processes. Sections 11 and 12 explain how invariant object categories are learned during free scanning of a scene, including how sequences of eye movements are generated to foveate salient features of different object views as invariant learning takes place. Section 13 summarizes how this foundation supports motor-equivalent sequences of arm movements to these salient features, and how these sequences may be stored within the prefrontal cortex in cognitive working memories that trigger the learning of cognitive plans. When modulated by cognitive–emotional interactions that are sculpted during reinforcement learning and incentive motivational learning, these cognitive plans may be used to choose the movements that will most probably acquire valued goals in different environments.
2. Why was evolution led to discover conscious states? Hierarchical resolution of uncertainty
ART goes beyond explanations of how, where, and when conscious states may be generated within our brains. It also proposes why evolution may have been driven to discover conscious states in the first place. This explanation follows naturally from the design principles that our brains use to autonomously adapt in real time to a changing world that may be filled with unexpected events. One of these design principles is called the hierarchical resolution of uncertainty. Hierarchical resolution of uncertainty means that it often takes multiple processing stages for our brains to generate a sufficiently complete, context-sensitive, and stable perceptual representation upon which to base a successful action.
Multiple processing stages are needed to complete 3-D boundary and surface representations with which to more informatively represent the scene (e.g., Grossberg, 2017b). Doing so requires that three hierarchical resolutions of uncertainty occur, which will be described below.
ART predicts that the processing stage where such a sufficiently complete, context-sensitive, and stable surface representation is completed “lights up” into a conscious state due to a resonance with a subsequent processing stage that marks this surface representation as being a good enough one upon which to base a successful action of looking or reaching. Such a resonance is called a surface–shroud resonance because the completed representation is a surface representation, and the form-fitting spatial attentional representation that resonates with it is called an attentional shroud (Fazl, Grossberg, & Mingolla, 2009; Tyler & Kontsevich, 1995). Surface–shroud resonances are predicted to be triggered by interactions between prestriate visual cortical area V4 and the posterior parietal cortex (PPC), before propagating both top-down to lower cortical areas such as V2 and V1, and bottom-up to higher cortical areas such as prefrontal cortex (PFC). Had earlier processing stages been used to trigger these actions, the wrong actions could have been generated, with potentially disastrous consequences for survival. This conscious state hereby provides an “extra degree of freedom” that enables our brains to avoid prematurely generating responses using inadequate perceptual representations.
In this way, ART clarifies that there is an intimate link between conscious states of seeing, hearing, feeling, and knowing, and the choice and execution of context-appropriate actions. ART proposes how resonances for conscious seeing help to ensure effective looking and reaching, resonances for conscious hearing help to ensure effective communications including speaking, and resonances for conscious feeling help to ensure effective goal-directed actions. ART also proposes how, when we consciously see a familiar valued object, we can also know some things about it, and have appropriate feelings about it.
The PPC can be both a source of top-down spatial attention with which to resonate with visual surface representations during a surface–shroud resonance, and of bottom-up motor commands to move the eyes and arms to attended positions in space, leading to the distinction between attention and intention in descriptions of parietal function (e.g., Andersen, Essick, & Siegel, 1985; Gnadt & Andersen, 1988; Snyder, Batista, & Andersen, 1997, 1998, 2000).
a Types of resonances and the conscious experiences that they embody. b Complementary “what” and “where” cortical stream properties. Cortical “what” stream perceptual and cognitive representations can solve the stability–plasticity dilemma, using brain regions like inferotemporal (IT) cortex, where recognition categories are learned. These processes carry out excitatory matching and match-based learning. Cortical “where” stream spatial and motor processes often carry out inhibitory matching and mismatch-based learning that do not solve the stability–plasticity dilemma, but rather adapt to changing bodily parameters, using brain regions like posterior parietal cortex (PPC). Whereas the recognition categories in the cortical “what” stream become increasingly invariant at higher cortical levels with respect to object views, positions, and sizes, the cortical “where” stream elaborates spatial representations of object positions and mechanisms whereby to act upon them. Together, the two streams can learn to recognize and become conscious of valued objects and scenes, while directing appropriate actions towards them
3. All conscious states are resonant states, but not conversely
Although ART predicts that “all conscious states are resonant states,” it does not predict that “all resonant states are conscious states.” Resonant states that are not accessible to consciousness, but that nonetheless dynamically stabilize learned memories, include parietal-prefrontal resonances that trigger the selective opening of basal ganglia gates to enable the readout of contextually appropriate thoughts and actions (Brown, Bullock, & Grossberg, 2004; Buschman & Miller, 2007; Grossberg, 2016b) and entorhinal-hippocampal resonances that dynamically stabilize the learning of entorhinal grid cells and hippocampal place cells during spatial navigation (Grossberg & Pilly, 2014; Kentros, Agniotri, Streater, Hawkins, & Kandel, 2004; Morris & Frey, 1997; Pilly & Grossberg, 2012). These resonances do not include feature detectors that are activated by external senses—such as those that support vision or audition—or internal senses—such as those that support emotion. Hence, they cannot become conscious.
4. Complementary computing and hierarchical resolution of uncertainty
Another reason why not all brain dynamics may lead to conscious states is that not all brain dynamics can become resonant, notably, spatial and motor processes, a property that is relevant for understanding how conscious perception and cognition are linked to action. The fact that not all brain dynamics are resonant is due to complementary computing (Grossberg, 2000, 2013, 2017b).
Complementary computing concerns the discovery that pairs of parallel cortical processing streams compute computationally complementary properties in the brain. The existence of processing streams is consistent with the idea that brain processing is specialized, but it does not imply that these streams contain independent modules. For example, Cavanagh (1986) has described independent modules for luminance, motion, binocular disparity, color, and texture that are combined together into more complex visual attributes at higher cortical processing stages. Independent modules should be able to fully compute their particular processes on their own. Much perceptual data argue against such independence. In particular, changes in perceived form or color can cause changes in perceived motion, and conversely. Changes in perceived brightness can cause changes in perceived depth, and conversely. For example, making an object in a picture brighter can make it look closer, relative to other objects in the scene, a property that is often called proximity-luminance covariance (Dosher, Sperling, & Wurst, 1986; Schwartz & Sperling, 1983).
Complementary computing explains such strong interactions between perceptual qualities by showing that each cortical processing stream has complementary computational strengths and weaknesses. These streams overcome their complementary deficiencies by interacting with one another using multiple processing stages that realize a hierarchical resolution of uncertainty, leading to perceptual representations that overcome the complementary uncertainties that each stream, on its own, would compute. The result is sufficiently complete, context-sensitive, and stable enough representations upon which successful actions can be based.
5. Complementary perceptual/cognitive and spatial/action streams: Tying cognition to action
Table 1b summarizes basic complementary properties of the “what” cortical stream for perception and cognition, and of the “where” cortical stream for spatial representation and action (Mishkin, 1982; Mishkin, Ungerleider, & Macko, 1983). Perceptual/cognitive processes in the “what” stream, which include the inferotemporal cortex, or IT, often use ART-like excitatory matching and match-based learning to create self-stabilizing categorical representations of objects and events that solve the stability–plasticity dilemma. An example of excitatory matching is that if you are primed to expect to see a yellow ball in a certain place, then you can recognize it more quickly and vigorously than if you were not primed. These excitatory matching and match-based learning processes enable increasing expertise, and an ever-expanding sense of self, to be rapidly and stably learned throughout life.
Table 1b also summarizes that complementary spatial/motor processes in the “where” stream, which include the posterior parietal cortex, or PPC, often use inhibitory matching and mismatch-based learning to continually update spatial maps and motor controllers that enable our changing bodies to carry out appropriate actions throughout life. This kind of inhibitory processing is often called Vector Associative Map, or VAM, processing (Gaudiano & Grossberg, 1991, 1992). Inhibitory matching subtracts an outflow representation of where our arm is now in space—a present position vector—from one that computes the position where we want to move—a target position vector—to compute a representation of the direction and distance of a desired movement—a difference vector (Bullock & Grossberg, 1988; Everts & Tanji, 1974; Georgopoulos, Kalaska, Caminiti, & Massey, 1982; Georgopoulos, Schwartz, & Kettner, 1986; Kalaska, Caminiti, & Georgopoulos, 1983). When the arm reaches the position where we want it to be, the target and present position vectors both code the same position in space, so the difference vector equals zero.
This kind of inhibitory matching cannot solve the stability–plasticity dilemma for two kinds of reasons. First, an inhibitory match cannot support an excitatory resonance, and thus cannot dynamically stabilize its learned representations using a resonant mechanism. Second, VAM mismatch learning calibrates a target position vector to equal the present position vector that is active when they both represent the same position in space. Thus, whenever bodily relationships change throughout life due to development, growth, exercise, and aging, the new present positions that are generated as a result will recode the corresponding target positions. Thus, spatial and motor learning experience continual overwriting by new experiences so that our brains can continue to learn how to accurately move our bodies as they change throughout life. Because they cannot resonate, spatial and motor representations, which are often called procedural memories (N. J. Cohen & Squire, 1980; Mishkin, 1982; Scoville & Milner, 1957; Squire & Cohen, 1984), cannot generate conscious internal representations; that is, there are no motor “qualia” that consciously represent the target and present positions of a planned action, even though we can consciously will the action to occur by choosing a target position or learned plan to execute a motor skill, and execute it by activating a volitional GO signal, as Section 13 explains in greater detail.
6. Invariant object category learning, Where/How stream, and reaching with visual form agnosia
An important reason for the “what–where” complementarity summarized in Table 1b is that the cortical “what” stream learns object recognition categories that become substantially invariant under changes in an object's view, size, and position at higher cortical processing stages, such as at the anterior inferotemporal cortex (ITa) and beyond (e.g., Booth & Rolls, 1998; Cao, Grossberg, & Markowitz, 2011; Chang, Grossberg, & Cao, 2014; Fazl et al., 2009; Tanaka, 1997, 2000). These invariant object categories enable our brains to recognize valued objects without experiencing the combinatorial explosion that would occur if they needed to store every individual experience, or exemplar, of every familiar object. However, because they are spatially invariant, these categories cannot locate and act upon a desired object in space. Cortical “where” stream spatial and motor representations can locate objects and trigger actions toward them, but cannot recognize them. By interacting together, the “what” and “where” streams can consciously see and recognize valued objects and direct appropriate goal-oriented actions toward them. Complementary computing hereby clarifies classical data that the cortical “where” stream is also a cortical “how” stream for the control of action, and is thus often called the “where/how” stream (Goodale & Milner, 1992; Goodale, Milner, Jakobson, & Carey, 1991). In particular, a top-down cognitive prime from the prefrontal cortex of the “what” cortical stream can bias how spatial attention is allocated in the “where” cortical stream and, with it, the actions that are thereby triggered (e.g., Baldauf & Desimone, 2014; Bichot, Heard, & DeGennaro, 2015; Fuster, 1973; Grossberg, 2018).
Studies of how these streams interact have clarified how some actions can occur without conscious knowledge of the objects to which they are directed. This occurs, for example, during visual form agnosia. The famous agnostic patient, D. F., was reported by Milner et al. (1991; see also Binstead, Brownell, Vorontsova, Heath, & Sauciser, 2007; Milner & Goodale, 1995). When D. F. visually inspected an oriented slot, her reports of the orientation of the slot showed little relationship to its actual orientation, whether her reports were made verbally or manually. However, when D. F. was asked to insert her hand, or a hand-held card, into the slot, D. F. did so accurately. In addition, her hand began to rotate in the appropriate direction as soon as it left the start position. In summary, although D. F. did not “know” the orientation of the slot, she could “see” the slot and insert her hand, or post a card into it, with considerable skill. How this can happen will also be explained below.
7. Three hierarchical resolutions of uncertainty to complete visual boundaries and surfaces
In order to understand how a surface–shroud resonance can support conscious seeing of visual qualia (see Table 1a) and thus be able to look at and reach attended objects, it is necessary to summarize basic cortical processes about how the brain sees. Perhaps the most basic fact about 3-D vision and figure–ground perception is that its functional units are 3-D boundaries and surfaces, processes that were first modeled in Grossberg (1984) and have enabled subsequent explanations and predictions of many data, including how looking at 2-D pictures can generate conscious 3-D percepts of occluding and occluded objects (e.g., Cao & Grossberg, 2005; Fang & Grossberg, 2009; Grossberg, 1994, 1997, 2016a; Grossberg & Yazdanbakhsh, 2005; Kelly & Grossberg, 2000), whose properties will be shown in Sections 11, 12, and 13 to be important for control of looking and reaching.
Neon color spreading, end gaps, and end cuts
End gaps are created in the following way: The boundary cells that are activated by the image in Fig. 4 are contrast-sensitive and orientationally tuned. These include the simple cells and complex cells in Fig. 5. Both the simple and complex cells that are activated by the black–white image contrasts become more active than the cells that are activated by blue–white contrasts. The active complex cells excite hypercomplex cells at their own positions at the next processing stage, while inhibiting neighboring hypercomplex cells via a short-range spatial competition network (see Fig. 5). Due to the contrast sensitivity of hypercomplex cell responses, the stronger black–white boundary signals inhibit nearby blue–white boundary cells more than conversely, thereby weakening the contiguous blue–white boundary—that is, creating an end gap.
The boundary cells at the hypercomplex level where end gaps form are tonically active and inhibit other boundary cells that are tuned to different orientations at the same position (see Fig. 5)—that is, by an orientational competition. In the absence of external inputs, the tonic activity of these cells is held in check by their mutual competition. When blue–white boundary cells are inhibited, the competitive balance is upset, causing cells that are tuned to other orientations, notably, the perpendicular orientation, to be disinhibited and to thereby create an extra boundary segment that is called an end cut.
In summary, end gaps and end cuts are formed as a result of two successive stages of spatial and orientational competition between contrast-sensitive and orientationally tuned hypercomplex cells (Grossberg, 1984; Grossberg & Mingolla, 1985).
Simple cells cannot detect line ends, but hypercomplex cells can
Although end cuts do not prevent all spreading of brightness and color from occurring, as neon color spreading illustrates (see Fig. 4), events like neon color spreading are rare. They also provide useful evidence for how our brains form boundaries and surfaces by showing how these processes can occasionally break down.
Complex cells can detect boundaries where contrast polarities reverse, but cannot see qualia
Pooling inputs from opposite contrast polarities at complex cells implies that boundaries cannot represent visual qualia. They cannot discriminate between dark–light and light–dark contrasts, or red–green and green–red contrasts, or blue–yellow and yellow–blue contrasts, because they pool together inputs from simple cells that are sensitive to all of these differences (Thorell, de Valois, & Albrecht, 1984) to form the best possible boundaries. In other words, boundaries are insensitive to direction of contrast. Although boundaries can vary in strength or distinctiveness as they receive inputs from variable numbers and strengths of inducers, they do not code for visible brightnesses or colors.
If boundaries are invisible, then how do we see anything? Consciously perceived qualia are predicted to be surface percepts (see Fig. 11). Visible surface percepts can be generated by different surface brightnesses or colors that may occur on two sides of a boundary after surface filling-in occurs, as illustrated by the enhanced brightness of the squares generated by the Kanizsa square stimuli in Fig. 12 (top row, left column; bottom row).
Boundary completion closes retinal boundary gaps using bipole grouping cells
Many boundaries would still remain incomplete if boundary processing stopped with hypercomplex cells. For example, the two Kanizsa squares in the top row of Fig. 12 would just be seen and recognized as four Pac-Man figures. Why does the brain bother completing boundaries, indeed illusory boundaries, between pairs of colinear Pac-Man edges?
To summarize what has already been described: The first hierarchical resolution of uncertainty uses hypercomplex cells to complete boundaries at line ends and corners that simple cells cannot detect. The second hierarchical resolution of uncertainty uses bipole grouping to complete positionally sharp boundaries at positions that are occluded by the blind spot and retinal veins, or behind occluders in any scene or image. The third hierarchical resolution of uncertainty concerns why surface filling-in occurs.
Filling-in completes surface representations after the illuminant is discounted
Completed boundaries input topographically to surface representations where they are both generators of, and barriers to, surface filling-in (Grossberg, 1994, 1997; Grossberg & Yazdanbakhsh, 2005; Kelly & Grossberg, 2000). These boundary-to-surface signals are predicted to occur from boundary representations within the interstripes of cortical area V2 to surface representations within Filling-In DOmains, or FIDOs, of the thin stripes of cortical area V2 (Figure 3). Each FIDO also receives bottom-up topographic brightness or color signals. For example, when blue color inputs in response to the neon color image in Fig. 4 activate the corresponding FIDO, blue color can spread outward in an unoriented manner across this FIDO. In particular, because the boundaries of the blue lines in Fig. 4 have lower contrast than those of the black lines, end gaps form in the boundaries generated by the blue lines where they abut the black lines. Blue color can flow out of these end gaps and spread across space until it hits the square illusory boundary that is completed by bipole grouping, which prevents its further spread.
In addition to its outward and unoriented spread, surface filling-in is also sensitive to direction of contrast, because we can consciously see its effects. Neon color spreading hereby illustrates three pairs of computationally complementary properties of boundary completion and surface filling-in (see Fig. 11): oriented versus unoriented; inward versus outward; insensitive to direction of contrast versus sensitive to direction of contrast. A good boundary completion process thus cannot also be a good surface filling-in process, and conversely. Interactions between these processes overcome their complementary deficiencies to generate completed boundaries and filled-in surfaces.
In what sense is surface filling-in an example of hierarchical resolution of uncertainty? The surface system “discounts the illuminant,” or compensates for variable illumination, at an early processing stage. If this did not happen, then the brain could erroneously process changes in illumination as changes in perceived object shapes and colors. If object shapes could plastically deform whenever illumination changed, then the brain could not learn to recognize a stable object percept.
At a later FIDO processing stage, surface filling-in spreads feature contour brightnesses and colors within the closed rectangular boundary contours to reconstruct a surface representation of the scene in which the illuminant is significantly discounted. This filled-in red rectangle is depicted in the After Filling-in image that is directly under the feature contour figure, above the label No Gap. Henceforth, this filled-in figure will be said to occur in Fig. 16 (left column, middle row). Surface filling-in of illumination-discounted feature contours is the third hierarchical resolution of uncertainty.
8. Recognizing occluded objects while seeing unoccluded opaque surfaces and transparent ones
The above boundary and surface interactions are necessary to understand how conscious states control actions, but they are not sufficient. In order to make the links to consciousness and action, it is necessary to also understand how boundaries and surfaces support 3-D figure–ground separation. The above properties of boundaries and surfaces have considered only how they work in two dimensions, or 2-D. In the real world, however, boundary completion and surface filling-in do their work in response to 3-D scenes that may contain partially occluded objects. In a 3-D world, the following questions also need to be answered: How do we recognize completed objects behind their occluders? Why do we only see the unoccluded parts of opaque objects, yet can also see occluded objects behind transparent occluders? How do conscious states respond to such figure–ground representations to trigger actions?
Closed boundaries contain depth-selective filling-in
How did evolution discover figure–ground separation? Remarkably, properties of figure–ground separation emerge from interactions that compute complementary consistency. Recall from Figs. 4 and 11 that the rules that govern boundary completion and surface filling-in are computationally complementary. Nonetheless, we typically consciously see, with fixed attention, only one percept in response to an image, except in special circumstances such as those that cause binocular rivalry. Complementary consistency is realized when the signals within V2 from the boundary stream to the surface stream that create barriers to the filling-in of object surfaces trigger feedback signals from the surface stream back to the boundary stream. To understand how this works, consider the image in Fig. 16, labeled Gap (right column, middle row). This image has a big gap, or hole, in its boundary. As a result, brightness and color can flow out of the boundary into the surrounding image, and conversely. The net effect is to equalize brightness and color contrasts on both sides of the boundary.
Surface contours are activated at positions where closed boundaries contain filling-in
The processing stage after surface filling-in occurs computes topographically organized feedback signals that are called surface contours back to its generative boundaries. Surface contours are generated by contrast-sensitive on-center off-surround networks that act across space and within each depth. Because of their contrast sensitivity, these networks generate output signals only at positions where they detect a rapid change in contrast across space. Such rapid contrast changes occur only at the contours of successfully filled-in surfaces, which are the surfaces that are surrounded by closed boundaries. Such a filled-in surface has already been described by the No Gap image in Fig. 16 (middle row, left column). The surface contour that is generated by this filled-in surface is shown just below it as a blue rectangle around the red region in Fig. 16 (bottom row, left column). The open blue circles at the corners of the blue rectangle are positions of enhanced surface contour activity whose cause, and function, will be explained below.
Surface contours are not generated at positions where open boundaries occur, as in response to the boundary Gap image in Fig. 16 (middle row, right column) because the surface filling-in that is caused by feature contours within regions with open boundaries can spread to both sides of their boundaries, and thus do not generate large contrasts at boundary positions.
Surface contours realize complementary consistency and initiate figure–ground separation
Surface contours can support both complementary consistency and figure–ground separation using the property, shown in Fig. 16, that surface contours form around filled-in surfaces that are surrounded by closed boundaries, but not around surfaces whose color and brightness can flow out of big boundary gaps. How this property helps to realize complementary consistency is clarified by da Vinci stereopsis percepts of 3-D scenes that each eye can see to different degrees. This often occurs during viewing of objects in natural scenes when a nearer object occludes part of the surface of a farther object (Cao & Grossberg, 2005, 2012; Nakayama & Shimojo, 1990).
In order to fill in the red picture at this farther depth, the brain first needs to create a closed boundary around it at this depth. However, only the left vertical boundary of the red picture is binocularly seen. How do the picture’s other three boundaries get created? In particular, how does the brain decide to what depth, or depths, the other boundaries, which do not generate strong depth signals, should be assigned? Grossberg (1994) predicted that such boundaries are assigned to all depths along their lines of sight in the V2 interstripes where binocular boundaries are computed (see Fig. 3). Yazdanbakhsh and Watanabe (2004) published psychophysical experiments that support this prediction by showing an “asymmetry between horizontal and vertical illusory lines in determining the depth of their embedded surface” (p. 2621).
These completed boundaries are topographically projected to the V2 thin stripes, where they control surface filling-in. Figure 20 (top row, right column) shows that only the closed rectangular boundary at Depth 1 can contain the filling-in of the picture’s red color. The open boundary at Depth 2 allows color to spread to both of its sides, as in the Gap image of Fig. 16. As a result, surface contours form only at Depth 1 at the same positions where boundaries, acting as filling-in barriers, block the spread of the filling-in process (cf. Fig. 16, middle row, left column). This rectangular surface contour is depicted in the V2 thin stripes of Fig. 20 as a blue rectangle.
As shown in Fig. 20, these surface contours deliver topographic feedback signals to the boundary representations that generated them. This is the feedback process that achieves complementary consistency. It is accomplished by an on-center off-surround network that is depicted by a downward-and-leftward green arrow (labeled contrast-sensitive excitation) and a leftward red arrow (labeled contrast-sensitive inhibition) in row two of Fig. 20. The on-center signals strengthen the boundaries that generated the successfully filled-in surface. This strengthened boundary at Depth 1 is depicted as a black rectangle in Fig. 20 (middle column, second row) in the V2 pale stripe representation that is labeled After Feedback. The inhibitory connections in the off-surround act within position and across depth and thereby inhibit redundant boundaries at the same positions but farther depths. This inhibitory process is called boundary pruning. The inhibited inverted C boundary at Depth 2 is depicted in light gray in Fig. 20 (middle column, second row).
This off-surround network from the nearer Depth 1 to the farther Depth 2 is an example of the asymmetry between near and far, which develops from experience because, among other things, we can walk forward but not backward, at least most of the time.
Complementary consistency is hereby realized by confirming and strengthening the boundaries that lead to successful surface filling-in while inhibiting those that do not.
Figure 20 also indicates how complementary consistency enables figure–ground separation to begin: By eliminating all redundant boundaries of an occluding object at farther depths (e.g., Depth 2), collinear boundaries that abut the occluding object at these depths can be amodally completed behind it, as in Fig. 17. Figure 20 does not explain all that has to happen for figure–ground separation to be completed. One also needs to explain how, in response to an image like the three abutting rectangles in Fig. 17, the vertical boundaries where the smaller rectangles touch the vertical occluding rectangle “belong” to the occluding rectangle, while being detached from the smaller rectangles, and how this event drives the representations of both smaller abutting rectangles to a further depth plane, in this case, Depth 2. After that happens, due to the boundary pruning shown in Fig. 20, their occluded boundaries can be collinearly completed behind the vertical occluding rectangle, as in Fig. 18 (top row, right column).
The FACADE model explains how this boundary separation and completion process occurs in cortical area V2, and uses the same bipole cells that complete boundaries of objects that are occluded by the retinal veins and blind spot (see Figs. 13 and 14). Then, direct pathways from V2 to higher cortical areas such as inferotemporal (IT) cortex, and back, are used to recognize this completed perceptual representation as part of a feature–category resonance (see Fig. 1), despite the fact that the occluded part of this rectangle is not consciously seen. Such recognition without seeing is said to be amodal.
Why do not all occluders look transparent?
If the completed boundary and surface behind the vertical rectangle could also be seen, then the vertical rectangle would look transparent, because both the horizontal rectangle, and the vertical rectangle in front of it, could be seen at the same spatial positions. If the completed parts of partially occluded objects could always be seen, then all occluders would look transparent. Confusion could then occur in the planning of looking and reaching behaviors because it could seem natural to reach directly through occluding objects to the occluded objects behind them. There is thus a design tension during evolution between the requirements of recognition and reaching. Conscious visibility enables the unoccluded parts of many surfaces to appear opaque, and thus good targets for reaching, without eliminating the ability of the visual cortex to correctly represent surfaces that are, in fact, transparent.
Completed V2 occluded regions are amodal, whereas unoccluded V4 regions are visible
The FACADE model predicts how cortical areas V2 and V4 work together to ensure that not all occluding objects look transparent: Cortical area V2 is proposed to complete object boundaries and surfaces of occluded object regions that may be amodally recognized, but not seen. Animals who could not recognize such partially occluded objects, such as a predator that is partially occluded by vegetation, would be at a severe survival disadvantage compared with those who could. Cortical area V4 is predicted to be the cortical region where figure–ground-separated 3-D surface representations of the unoccluded regions of opaque object regions are completed, and thereupon support both seeing and recognition of these regions (see Fig. 18). These unoccluded object surface regions are the parts of a scene that are typically consciously seen as we explore the world, and are used to control looking and reaching movements. The same model neural mechanisms also explain how V4 also supports seeing of 3-D surfaces that really are transparent (Grossberg & Yazdanbakhsh, 2005).
The hypothesis that V4 represents 3-D surfaces whose objects have been separated from one another in depth is consistent with several different types of neurobiological experiments (e.g., Chelazzi, Miller, Duncan, & Desimone, 2001; Desimone & Schein, 1987; Lueck et al., 1989; Ogawa & Komatsu, 2004; Reynolds, Pasternak, & Desimone, 2000; Schiller & Lee, 1991; Zeki, 1983). Additional experiments that distinguish between recognizing and seeing occluding and occluded objects regions are much to be desired.
9. Surface–shroud resonances between V4 and PPC control conscious seeing and action
A surface–shroud resonance (see Table 1a) is thus assumed to be triggered between V4 and PPC because V4 is predicted to be the cortical stage at which figure–ground-separated 3-D surface representations of unoccluded surface regions are computed. Such a surface–shroud resonance provides a conscious surface visibility signal to mark the opaque unoccluded surface regions to which orienting eye movements and reaching arm movements can be successfully directed.
10. Feature–category resonances for recognition and surface–shroud resonances for seeing
11. Solving the view-to-object binding problem during free scanning of a scene
An invariant recognition category is one for which the same small set of cells responds to different views, positions, and sizes of an object’s retinal images. Learning invariant categories enables our brains to avoid the combinatorial explosion of memories and search times that would be needed if a different exemplar of each of an object’s retinal image had to be learned and searched, and how they could all be associatively linked to generate recognition responses, such as the name of the object.
In order to explain how invariant object categories could be learned, it was necessary to first propose a solution of the view-to-object binding problem. This problem arises because, as our eyes scan a scene, two successive eye movements may focus on different parts of the same object or on different objects. How does the brain avoid learning to erroneously classify views of different objects together, and do so without an external teacher? For example, suppose that the eyes sequentially scan a face, bird, and cloud in a natural scene. Why does not the brain learn to associate them all with the same invariant object category?
Surface–shroud resonances were discovered as a key brain design for regulating what exemplars in a scene could be associated through learning with an emerging invariant object category. Only after this regulatory role for a surface–shroud resonance in invariant category learning was articulated did it gradually became clear this was the type of resonance that I had been seeking for many years in response to my predictions that “all conscious states are resonant states” (e.g., Grossberg, 1980) and that “all consciously visible qualia are surface percepts” (e.g., Grossberg, 1994). Putting these two assertions together led to the question: What kind of resonance supports conscious percepts of visible qualia? How do we consciously see?
As modeling invariant category learning progressed, it became clear that surface–shroud resonances had the requisite properties. That realization enabled a deeper understanding how feature–category resonances for recognition interact with surface–shroud resonances for seeing, and how surface–shroud resonances for seeing select surface representations that could be used to direct looking and reaching, as in Figs. 23 and 24.
First, a surface–shroud resonance maintains sustained spatial attention upon an object’s surface. Functional neuroimaging data in humans suggests that a region in the left posterior intraparietal sulcus (IPS) “may be involved in continuously maintaining the current state of attention” (Yantis et al., 2002), a conclusion that was also supported by Corbetta, Kincade, Ollinger, McAvoy, and Shulman (2000). Chiu and Yantis (2009) reported additional evidence for a surface–shroud resonance between V4 and PPC, notably “increased activation in extrastriate cortex and posterior intraparietal sulcus (IPS) contralateral to the locus of spatial attention” (p. 3933). Clinical patients cannot maintain sustained attention after they experience suitable parietal lesions, including lesions of the inferior parietal lobe, or IPL (Husain & Nachev, 2007; Rueckert & Grafman, 1998).
Second, an active shroud inhibits a population of tonically active reset cells in the parietal cortex (see Fig. 25). When the shroud shuts off, as occurs when spatial attention shifts from one object to another, these reset cells are disinhibited. They can then generate a transient burst of activation that inhibits any invariant object category in ITa that may be active at that time. Learning an invariant category of the newly attended object can then commence, without interference from the previously activate category.
This prediction was supported by experiments by Chiu and Yantis (2009) that used rapid event-related MRI in humans. These authors found that a shift of spatial attention evokes a transient domain-independent signal in the medial superior parietal lobule (SPL in Fig. 25) that corresponds to a shift in categorization rules. In the ARTSCAN model, collapse of an attentional shroud (spatial attention shift) in IPS disinhibits the parietal SPL reset mechanism (transient signal) that leads to inhibition in ITa of the active invariant object category and instatement of a new one (shift in categorization rules).
This transient parietal signal is “domain-independent” in the model because the parietal reset mechanism can be inhibited by spatial attention in PPC that focuses upon any object surface, and can reset any active invariant category in ITa when it is disinhibited. In other words, the category reset population of cells in medial SPL is predicted to receive converging inhibitory signals from many parts of PPC, and to emit diverging inhibitory signals to many parts of ITa. This experiment provides a useful marker for experimentally testing additional properties of the ARTSCAN model and its variants.
12. Surface–shroud resonance enables saccades to foveate salient features of an attended object
Surface contours compute salient features and attention pointers
In addition to achieving complementary consistency and initiating figure–ground separation, surface contours also compute target positions on an attended surface to which saccadic eye movements can be directed. This last property arises because a surface contour is generated by a contrast-sensitive on-center off-surround network that operates across space upon the filled-in surface contrasts of a given object, as was explained in Section 8.
As a result, the positions of salient features—such as positions where the curvature of the surface’s bounding contour changes quickly—are more active in a surface contour. The white circles at the corners of the filled-in rectangle at the bottom of Fig. 16 illustrate this property. Figure 20 also depicts how, using surface-to-boundary feedback signals, such a salience-sensitive surface contour strengthens its generative boundary while suppressing redundant boundaries at the same positions but at farther depths, thereby achieving complementary consistence and initiating figure–ground separation. These salient features have properties of what Cavanagh, Hunt, Afraz, and Rolfs (2010) called attention pointers because, as will now be explained using Fig. 26, these salient positions become the target positions of saccadic eye movements as the locus of attention shifts predictively across the object surface.
From salient features to target positions: One role of V3A
Figure 26 summarizes the feedback loop that occurs within V2 between completed boundaries and filled-in surfaces, with surface contours closing the loop from surfaces in the thin stripes of V2 to boundaries in the interstripes of V2 (see Fig. 3). This is the feedback loop that is depicted in somewhat greater detail in Fig. 20. Figure 26 shows that outputs from V2 surface contours activate two parallel pathways. One pathway is the one depicted in Fig. 20. The other pathway chooses the most active position on a surface contour using a recurrent on-center off-surround winner-take-all network (Grossberg, 1973, 1980). The chosen position is the target position of the next saccade.
This transformation is predicted to occur between V2 and V3A. It must occur after V2 because it is in V2 that figure–ground separation occurs. The claim that V2 carries out figure–ground separation is supported by many experiments, notably the neurophysiological data about V2 in articles from the von der Heydt laboratory (e.g., O’Herron & von der Heydt, 2009; Qiu & von der Heydt, 2005; Qiu, Sugihara, & von der Heydt, 2007; von der Heydt, Zhou, & Friedman, 2000; Zhang & von der Heydt, 2010; Zhou, Friedman, & von der Heydt, 2000) that are given a unified explanation in Grossberg (2016a).
Regulating saccades to foveate on salient features of an attended object must occur after V2 because it is only after figure–ground separation occurs that attention can be focused on a prescribed object surface. V3A begins to transform visual representations to motor commands (Backus, Fleet, Parker, & Heeger, 2001; Caplovitz & Tse, 2007; Galletti & Battaglini, 1989; Nakamura & Colby, 2000). Indeed, Caplovitz and Tse (2007) have written that “neurons within V3A . . . process continuously moving contour curvature as a trackable feature . . . not to solve the ‘ventral problem’ of determining object shape, but in order to solve the ‘dorsal problem’ of what is going where” (p. 1179).
The target position commands from V3A also activate two parallel pathways (see Fig. 26). One pathway sends signals to the lateral intraparietal area (LIP) of the PPC, which, in turn, projects to the frontal eye fields (FEF) and the superior colliculus (SC) to generate saccadic eye movements to the chosen target position (Andersen, Brotchie, & Mazzoni, 1992; Bisley, Mirpour, Arcizet, & Ong, 2011; Blatt, Andersen, & Stoner, 1990; Goldberg, 2001; Nakamura & Colby, 2002; Olson & Colby, 2013; Paré & Wurtz, 2001; Snyder, 2000; Snyder et al., 2000). Along this route, LIP also projects to regions like the anterior intraparietal cortex, or AIP, that is used to control grasping movements (Battaglia-Mayer & Caminiti, 2009; Cohen & Anderson, 2002; Crawford, Medendorp, & Marotta, 2004; Nakamura et al., 2001), which will further discussed in Section 13.
From salient features to gain fields and predictive remapping: Another role of V3A
The second pathway that receives target position commands is a gain field (Anderson et al., 1985, 1987; Andersen & Mountcastle, 1983; Deneve & Pouget, 2003; Fazl et al., 2009; Gancarz & Grossberg, 1999; Grossberg & Kuperstein, 1986; Pouget, Dayan, & Zemel, 2003) that operates between surface representations in V4 and spatial attentional shrouds in PPC. This gain field is a population of cells that is activated by target position signals and used to transform the retinotopic coordinates of an attended surface into the head-centered coordinates of its attentional shroud.
Why are shrouds computed in head-centered coordinates? The need for this arises from the fact that we consciously see visual surface qualia that are computed in retinotopic coordinates. In other words, we see whatever the eyes currently foveate in the center of our view, with previously foveated parts of a scene shifted to positions that lie in a direction opposite to that of the last eye movement. This state of affairs raises the following question: When a large eye movement occurs on an object surface, why does not the newly foveated position sometimes lie off positions of the shroud, thereby causing a collapse of the shroud as it does so? Such a collapse would disinhibit category reset cells, which can then inhibit the emerging invariant category, thereby preventing invariant object category learning from proceeding. Somehow the currently active shroud must remain stable as the eyes explore the surface of one object. The ARTSCAN model proposes that this is accomplished by computing shrouds in head-centered coordinates that do not move when the eyes move. The transformation from an attended surface in retinotopic coordinates to its attentional shroud in head-centered coordinates is accomplished by a gain field.
Predictive remapping keeps the shroud in stable head-centered coordinates during saccades
The (target position)-to-(gain field) signals that update a head-centered shroud occur very quickly, before an eye movement is complete, to preserve the shroud’s head-centered representation during the eye movement. This process is called predictive remapping. Predictive remapping describes neurophysiological data about how parietal representations are updated by intended eye movements (Duhamel, Colby, & Goldberg, 1992; Gottlieb, Kusunoki, & Goldberg, 1998; Mathot & Theeuwes, 2010; Melcher, 2007, 2008, 2009; Saygin & Sereno, 2008; Sommer & Wurtz, 2006; Tolias et al., 2001; Umeno & Goldberg, 1997).
Both retinotopic and spatial coordinates are needed during active vision
How the visual world appears to remain stable as our eyes actively scan a scene requires interacting combinations of retinotopic and head-centered, or spatial, representations. Functional neuroimaging (fMRI) data of Burr and Morrone (2011) illustrate this subtlety: “We firstly report recent evidence from imaging studies in humans showing that many brain regions are tuned in spatiotopic [head-centered] coordinates, but only for items that are actively attended” (p. 504). These data are consistent with properties of “attention pointers” that rapidly update gain fields via predictive remapping to maintain spatiotopic shroud stability during eye movements that scan an attended object, even while the conscious visual representation of the object surface is computed in retinotopic coordinates that move around with the eyes.
Exploring an attended object surface with saccades: Why eyes do not move randomly
Putting together all of these observations provides an explanation of how our eyes can explore salient features of an attended object using sequences of saccadic eye movements. Figure 26 helps to keep the relevant interactions in mind. First suppose that a surface–shroud resonance is active between V4 and the IPS in the PPC. Due to the top-down excitatory signals to the attended surface representation in V4, the contrast of this surface is increased (see Fig. 22). As a result, its surface contours are also strengthened due to the contrast-sensitivity of the on-center off-surround network that generates them. The salient feature positions on these surface contours are correspondingly strengthened (see Fig. 16), thereby enabling these positions to more easily win the competition to determine the next target position of a saccade (see Figs. 26).
After a target position is chosen, it generates a saccade command to LIP and subsequent saccadic movement centers such as the frontal eye fields (FEF) and the superior colliculus (SC), while also rapidly updating the gain field that will keep the attentional shroud stably maintained in head-center coordinates when this eye movement is executed (see Fig. 26). As the target position generates these excitatory output signals, it also sends an inhibitory signal back to its source to prevent its perseverative performance. This kind of self-inhibition, or inhibition of return, has often been used in neural models of how saccade sequences are recalled (e.g., Grossberg & Kuperstein, 1986, Chapter 9; Silver, Grossberg, Bullock, Histed, & Miller, 2011), as well as, more generally, in models of how sequences of stored items in cognitive, motor, and spatial working memories are recalled (e.g., Grossberg & Pearson, 2008). Then, the next most active surface contour position can be chosen, and the saccadic cycle repeats itself until all the attended salient features are foveated, or the surface–shroud collapses.
Surface–shroud collapse can occur because the transmitters that multiplicatively gate the bottom-up and top-down excitatory signals that maintain the resonance habituate in an activity-dependent way, and/or the last saccade brings the eye closer to another object that can generate a stronger, nonhabituated, surface–shroud resonance. In this way, the eyes can continue to search different objects in a scene, and can inspect salient cues on each item before saccading to the next one. This search cycle coordinates processes of spatial and object attention, figure–ground separation, predictive remapping, invariant object category learning, and visual search.
This cycle also shows why saccades do not just randomly explore a novel scene. If they did, it would not be possible to learn view-invariant object categories. The ability of saccades to sequentially explore different views of an attended object even in novel scenes has been supported by Theeuwes, Mathot, and Kingstone (2010), whose psychophysical data show that “the eyes prefer to stay within the same object” (p. 597).
Both transient and sustained parietal representations regulate attention
Different parts of PPC operate with different time scales that vary between sustained and transient. Sustained attention occurs between the inferior parietal sulcus (IPS) and V4 during a surface–shroud resonance (Chiu & Yantis, 2009; Corbetta et al., 2000: Yantis et al., 2002). The lateral intraparietal area (LIP) begins the conversion of a target position command into a saccade, and is reset to instate the next command even while the surface–shroud resonance persists. Finally, when the surface–shroud resonance does collapse, this shift of spatial attention causes a transient parietal reset burst in the medial superior parietal lobule (SPL; Chiu & Yantis, 2009).
13. From head-centered looking to body-centered motor-equivalent reaching sequences and tools
The circuits in Figs. 20, 25, and 26 clarify how sequences of salient target positions in head-centered coordinates can be chosen as an individual pays spatial attention to, and inspects, a novel or familiar object, leading to invariant object category learning, recognition, and visual search. Once these basic insights are available, they can be combined with other, compatible, modeling studies that explain, in addition, how head-centered spatial coordinates are transformed through learning into body-centered spatial coordinates both to control movement-invariant shrouds for invariant object category learning, as well as to control arm reaching movements to the same attended positions in space to which the eyes move (Y. E. Cohen & Andersen, 2002; Deubel, Schneider, & Paprotta, 1998; Schiegg, Deubel, & Schneider, 2003; Schneider & Deubel, 2002). How such a body-centered representation may be learned in real time using outflow neck position signals in addition to the outflow eye target position signals in Fig. 27 has been modeled in Guenther, Bullock, Greve, and Grossberg (1994). These body-centered spatial representations may be used to learn to control motor-equivalent arm movements. The DIRECT model of Bullock, Grossberg, and Guenther (1993) models motor-equivalent reaches that are accurate on the first try, even if the elbow is clamped at a fixed angle, just so long as the target is still within the arm’s workspace. They are also accurate on the first try if the target is reached with a tool under visual guidance, without measuring the length of the tool or its orientation in the hand, despite the fact that the tool constitutes an additional “limb” that has been added to the hand without any additional learning. The DIRECT model circuit is shown in Fig. 27.
Such a spatial affordance for tool use arises automatically in the model after it learns a representation of the space around it using a circular reaction, which is a principal way that reaching behaviors are learned in children (Piaget, 1945, 1951, 1952). It is called a “spatial” affordance for tool use because a representation of the space around the child is learned, and this spatial representation is downloaded into a command to move any limb to the desired target position. The human ability to use tools may thus have arisen from basic properties of how visually guided reaches in space are learned.
All babies normally go through a babbling phase, and it is during such a babbling phase that a circular reaction can be learned. During a visual circular reaction, babies endogenously babble, or spontaneously generate, hand/arm movements to multiple positions around their bodies. As their hands move in front of them, their eyes automatically, or reactively, look at their moving hands. While the baby’s eyes are looking at its moving hands, the baby learns an associative map from its hand positions to the corresponding eye positions, and from eye positions to hand positions. Learning of the map between eye and hand in both directions constitutes the “circular” reaction.
After map learning occurs, when a baby, child, or adult looks at a target position with its eyes, this eye position can use the learned associative map to activate a movement command to reach the corresponding position in space. If the volitional will to act is activated by opening the correct basal ganglia gate, then the selected hand/arm can reach to the foveated position in space under volitional control.
The DIRECT model begins to learn a circular reaction that is energized by an Endogenous Random Generator, or ERG (see Fig. 27). During the circular reaction, DIRECT learns how to combine the target position on the retina, the position of the eyes in the head, and the position of the head in the body into a representation of the position of the target in space. This spatial position can then be used to learn how to accurately reach with any of several motor effectors, which is the property of motor-equivalence, as well as with a tool. DIRECT hereby demonstrates how the spatial affordance for tool use, one of the most important foundations of human societies, is an automatic consequence of a brain’s ability to learn a circular reaction for motor-equivalent reaching in space. The caption of Fig. 27 explains the model properties that accomplish this.
This foundation enables the learning of sensory-motor skills. Sequences of eye saccades or arm reaching movements may be temporarily stored in an item-order-rank working memory in the prefrontal cortex before they are unitized through learning as sequence categories, plans, or list chunks by a masking field network (M. A. Cohen & Grossberg, 1986, 1987; Grossberg & Pearson, 2008; Silver et al., 2011). Such an item-order-rank working memory can store sequences of items or events that are repeated, as in the list ABACBD. Feedback interactions between an item-order-rank working memory and a masking field list chunking network enable stable learning of list chunks that can selectively respond to stored sequences of variable length. Activation of such a list chunk can read out previously learned sequences of skilled arm movements into working memory, from which they can be rehearsed under volitional control at variable speeds.
Cognitive–emotional interactions that are sculpted during reinforcement learning and incentive motivational learning enable the choice of that list chunk which, in the current context, controls the arm movement sequence that is most likely to acquire a valued goal in the current environment (e.g., Dranias, Grossberg, & Bullock, 2008).
Huang and Grossberg (2010) have, moreover, shown how the spatial positions and objects that have previously been searched in a scene can be stored in parallel spatial and object working memories that enable subsequent movement choices to use the context of previous sequences of choices to move to the best positions and objects in a familiar scene. This ARTSCENE Search model shows how spatial working memories in parahippocampal cortex and dorsolateral prefrontal cortex interact with object working memories in perirhinal cortex and ventrolateral prefrontal cortex to realize these properties. Such concepts have enabled the ARTSCENE Search model to quantitatively simulate all the major types of data from the psychophysical literature on contextual cueing.
The most advanced model of how action sequences may be controlled by cognitive and cognitive–emotional processes is the Adaptive Resonance Theory, or pART, model (Grossberg, 2018), which offers a unified neural theory of the prefrontal cortex and its functions. The pART combines all of the above properties, in addition to explaining how working memory storage in prefrontal cortex becomes selective and only enables task-relevant events to influence cognitive processing and action choices. The scope of pART is illustrated by the following summary of its properties.
The pART model explains and simulates how prefrontal cortices play an essential role in working memory and cognitive–emotional processes through interactions with multiple brain regions. Prefrontal properties of recent neurobiological data about desirability, availability, credit assignment, category learning, and feature-based attention are explained. These properties arise through interactions of orbitofrontal, ventrolateral prefrontal, and dorsolateral prefrontal cortices with the inferotemporal cortex, perirhinal cortex, parahippocampal cortex, ventral bank of the principal sulcus, ventral prearcuate gyrus, frontal eye fields, hippocampus, amygdala, basal ganglia, hypothalamus, and visual cortical areas V1, V2, V3A, V4, middle temporal cortex, medial superior temporal area, lateral intraparietal cortex, and posterior parietal cortex.
Model explanations also include how the value of visual objects and events is computed, which objects and events cause desired consequences and which may be ignored as predictively irrelevant, and how to plan and act to realize these consequences, including how to selectively filter expected versus unexpected events, leading to actions toward, and conscious perception of, expected events. Modeled processes include reinforcement learning and incentive motivational learning, object and spatial working memory dynamics, and category learning, including the learning of object categories, value categories, object-value categories, and sequence categories, or list chunks.
Multiple prediction error processes in the brain and in technology
The pART model includes a significant role for the basal ganglia in regulating brain dynamics, including how the substantia nigra pars compacta (SNc) and related areas can regulate learning in response to unexpected outcomes, or prediction errors; and how the substantia nigra pars reticulata (SNr) and related areas can regulate the opening and closing of gates, by activating and deactivating volitional GO signals, that regulate what thoughts, feelings, and actions will actually be realized. In so doing, pART builds upon a sequence of previous detailed modeling studies of these basal ganglia functions, and the data that they have explained and predicted (e.g., Brown, Bullock, & Grossberg, 1999, 2004; Dranias et al., 2008; Grossberg, 2016b; Grossberg, Bullock, & Dranias, 2008; Grossberg & Kishnan, 2018).
These articles also clarify that multiple brain regions use predictive errors to guide new learning. In addition to the basal ganglia, such brain regions include the thalamocortical and corticocortical feedback circuits, interacting with brain regions like the nonspecific thalamus and hippocampus, that enable ART circuits to learn new recognition categories in response to novel or unexpected events (Carpenter & Grossberg, 1993; Grossberg & Versace, 2008). The kinds of prediction error that are computed in the basal ganglia, nonspecific thalamus, and hippocampus are different from the mismatches that can drive motor learning per se in the parietal and motor cortices (e.g., see Table 1b).
Other models have also proposed how prediction error modulates cortical coupling (e.g., den Ouden, Daunizeau, Roiser, Friston, & Stephan, 2010) and have used a Bayesian hierarchical learner to describe the model’s online inference process. Sleep/wake manipulations in Bayesian Helmholtz machines also use Bayesian methods (e.g., Dayan & Hinton, 1996). The main utility of these models is in adaptive prediction applications. In contrast, the biological neural models that are described herein enable a detailed understanding of the neural architectures that can rapidly learn and perform such inferences in changing environments that are filled with unexpected events, while also solving the stability–plasticity dilemma along the way, and providing unified explanations and predictions of large amounts of interdisciplinary data. Because they are fully specified mathematically, these models can also be used in applications, as have many others that my colleagues and I have developed (cf. http://techlab.bu.edu/resources/articles/C5).
14. Concluding remarks
This article summarizes some basic reasons why feedback processes operate at all levels of the cerebral cortex and thalamus. To illustrate the general prediction that “all conscious events are resonant events,” the article has described some of the main cortical processing stages that enable computationally complementary boundary and surface representations to be completed and filled-in via a process of hierarchical resolution of uncertainty. Then a surface–shroud resonance consciously “lights up” a surface representation that is complete, context-sensitive, and stable enough to be used to direct successful looking and reaching behaviors. This analysis also distinguishes between feature–category resonances for knowing, or recognition, and surface–shroud resonances for seeing, and suggests how we can know about familiar objects that we see due to synchronization of these resonances via shared circuits in prestriate cortical areas V2 and V4.
Either of these kinds of resonances can generate top-down expectation signals from V4 to earlier cortical stages, even to the lateral geniculate nucleus (Gove, Grossberg, & Mingolla, 1995; Murphy & Sillito, 1987; Sillito, Jones, Gerstein, & West, 1994). As noted in Section 1, due to the way in which object attention works—via top-down, modulatory on-center, off-surround networks that embody the ART Matching Rule (Bhatt, Carpenter, & Grossberg, 2007; Carpenter & Grossberg, 1987, 1991; Grossberg, 1980, 2013), which is also sometimes called “biased competition” (Desimone, 1998; Kastner & Ungerleider, 2001; Reynolds & Heeger, 2009)—these top-down signals can select cell activations that are consistent with the resonating surface representation, while suppressing cell activations that are not, thereby selecting those lower-level representations that are compatible with the chosen action.
The article also describes how feedback interactions among multiple cortical areas can direct sequences of saccadic eye movements to foveate salient features of an attended surface. Attention upon the surface is sustained via a surface–shroud resonance, which can also be consciously seen as a result. These cortical regions include both IPS and LIP within PPC, as well as V2, V3A, and V4 within the prestriate visual cortex. The different foveated object views can then trigger learning of view-specific object categories in cortical areas like ITp via feature–category resonances, which are then linked together by associative learning to create invariant object categories in cortical areas like ITa.
ART hereby provides a computational explanation of why both feature–category resonances and surface–shroud resonances are needed. In particular, perceptual and cognitive processes in the “what” ventral processing stream use excitatory matching and match-based learning (see Table 1b) to learn categorical representations of objects and events in the world using feature–category resonances. Match-based learning solves the stability–plasticity dilemma and can occur quickly without causing catastrophic forgetting, much as new faces can be learned quickly without forcing unselective forgetting of familiar faces.
Such match-based learning supports the creation of category representations at higher cortical levels that are increasingly invariant under changes in an object’s views, positions, and sizes. That is, match-based learning can support invariant category learning (see Section 6), which enables learning to categorize the world without causing a combinatorial explosion of memories. However, positionally invariant object category representations cannot, by themselves, be used to manipulate objects at particular positions in space.
That is why complementary spatial and motor processes in the “where/how” dorsal cortical processing stream are needed to focus spatial attention upon and manipulate objects in space. These processes often use VAM-like inhibitory matching and mismatch learning (Section 5, and Table 1b) to continually update spatial maps and sensory–motor gains whereby to control looking or reaching behaviors (see Fig. 24). These inhibitory circuits cannot support an adaptive resonance, and thus do not generate conscious states.
Either excitatory or inhibitory matching and learning process in Table 1b is insufficient on its own to learn about the world and to effectively act upon it, but together they can. Perceptual and cognitive processes use excitatory matching and match-based learning to create self-stabilizing representations of objects and events that embody increasing expertise about the world, and conscious awareness of it. Complementary spatial and motor processes use inhibitory matching and mismatch learning to continually update spatial maps and sensory-motor gains to compensate for bodily changes throughout life. Together they provide a self-stabilizing perceptual and cognitive front end for conscious awareness and knowledge acquisition, which can intelligently manipulate more labile spatial and motor processes that enable our changing bodies to act effectively upon a changing world.
- Andersen, R. A., Snyder, L. H., Batista, A. P., Buneo, C. A., & Cohen, Y. E. (1998). Posterior parietal areas specialized for eye movements (LIP) and reach (PRR) using a common coordinate frame. In G. R. Bock & J. A. Goode (Eds.), Sensory guidance of movement (Novartis Foundation Symposium 218) (pp. 109–128). Chichester, UK: Wiley.Google Scholar
- Battaglia-Mayer, A., & Caminiti, R. (2009). Posterior parietal cortex and arm movement. In L. R. Squire (Ed.), Encyclopedia of neuroscience (pp. 783–795). London, UK: Elsevier.Google Scholar
- Binstead, G., Brownell, K., Vorontsova, Z., Heath, M., & Sauciser, D. (2007). Visuomotor system uses target features unavailable to conscious awareness. Proceedings of the National Academy of Sciences, 104, 12669–12672.Google Scholar
- Bullock, D., Grossberg, S., & Guenther, F. H. (1993). A self-organizing neural model of motor equivalent reaching and tool use by a multijoint arm. Journal of Cognitive Neuroscience, 3, 408–435.Google Scholar
- Burr, D. C., & Morrone, M. C. (2011). Spatiotopic coding and remapping in humans. Philosophical Transactions of the Royal Society B, 366, 504–515.Google Scholar
- Carpenter, G. A., & Grossberg S. (1987). A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing, 37, 54–115.Google Scholar
- Carpenter, G. A., & Grossberg, S. (1991). Pattern recognition by self-organizing neural networks. Cambridge, MA: MIT Press.Google Scholar
- Carpenter, G. A., & Grossberg, S. (1993). Normal and amnesic learning, recognition, and memory by a neural model of cortico-hippocampal interactions. Trends in Neurosciences, 16, 131-137.Google Scholar
- Cavanagh, P. (1986). Reconstructing the third dimension: Interactions between color, texture, motion, binocular disparity, and shape. Computer Vision, Graphics, and Image Processing, 37, 171–195.Google Scholar
- Chang, H.-C., Grossberg, S., & Cao, Y. (2014) Where’s Waldo? How perceptual cognitive, and emotional brain processes cooperate during learning to categorize and find desired objects in a cluttered scene. Frontiers in Integrative Neuroscience. https://doi.org/10.3389/fnint.2014.0043
- den Ouden, H. E. M., Daunizeau, J., Roiser, J., Friston, K. J., & Stephan, K. E. (2010). Striatal prediction error modulates cortical coupling. The Journal of Neuroscience, 30, 3210–3219.Google Scholar
- Deubel, H., Schneider, W. X., & Paprotta, I. (1998). Selective dorsal and ventral processing: Evidence for a common attentional mechanism in reaching and perception. Visual Cognition, 5, 81–107.Google Scholar
- Everts, E. V., & Tanji, J. (1974). Gating of motor cortex reflexes by prior instruction. Brain Research, 71, 479–494.Google Scholar
- Franklin, D. J., & Grossberg, S. (2017). A neural model of normal and abnormal learning and memory consolidation: Adaptively timed conditioning, hippocampus, amnesia, neurotrophins, and consciousness. Cognitive, Affective, & Behavioral Neuroscience, 17, 24–76.Google Scholar
- Gaudiano, P., & Grossberg S. (1991). Vector associative maps: Unsupervised real-time error-based learning and control of movement trajectories. Neural Networks, 4, 147–183.Google Scholar
- Gaudiano, P., & Grossberg, S. (1992). Adaptive vector integration to endpoint: Self-organizing neural circuits for control of planned movement trajectories. Human Movement Science, 11, 141–155.Google Scholar
- Goldberg, M. E. (2001). Parietal lobe. In N. J. Smelser & P. B. Baltes (Eds.), International encyclopedia of the social & behavioral sciences. London, UK: Elsevier.Google Scholar
- Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and action. Trends in Neurosciences, 15, 20–15.Google Scholar
- Grossberg, S. (1973). Contour enhancement, short-term memory and constancies in reverberating neural networks. Studies in Applied Mathematics, 52, 217–257.Google Scholar
- Grossberg, S. (1984). Outline of a theory of brightness, color, and form perception. In E. Degreef & J. van Buggenhaut (Eds.), Trends in mathematical psychology (pp. 59–85). Amsterdam, Netherlands: North Holland.Google Scholar
- Grossberg, S. (2014). How visual illusions illuminate complementary brain processes: Illusory depth from brightness and apparent motion of illusory contours. Frontiers in Human Neuroscience, https://doi.org/10.3389/fnhum.2014.00854
- Grossberg, S. (2016a). Cortical dynamics of figure–ground separation in response to 2D pictures and 3D scenes: How V2 combines border ownership, stereoscopic cues, and Gestalt grouping rules. Frontiers in Psychology: Perception Science. Retrieved from http://journal.frontiersin.org/article/10.3389/fpsyg.2015.02054/full
- Grossberg, S. (2016b). Neural dynamics of the basal ganglia during perceptual, cognitive, and motor learning and gating. In J.-J. Soghomonian (Ed.), The basal ganglia: Novel perspectives on motor and cognitive functions (pp. 457–512). Berlin, Germany: Springer.Google Scholar
- Grossberg, S. (2017a). Acetylcholine neuromodulation in normal and abnormal learning and memory: Vigilance control in waking, sleep, autism, amnesia, and Alzheimer’s disease. Frontiers in Neural Circuits. https://doi.org/10.3389/fncir.2017.00082
- Grossberg, S. (2018). Desirability, availability, credit assignment, category learning, and attention: Cognitive–emotional and working memory dynamics of orbitofrontal, ventrolateral, and dorsolateral prefrontal cortices. Brain and Neuroscience Advances. Retrieved from http://journals.sagepub.com/doi/full/10.1177/2398212818772179
- Grossberg, S., & Kishnan, D. (2018). Neural dynamics of autistic repetitive behaviors and Fragile X syndrome: Basal ganglia movement gating and mGluR-modulated adaptively timed learning. Frontiers in Psychology, Psychopathology. https://doi.org/10.3389/fpsyg.2018.00269
- Grossberg, S., & Kuperstein, M. (1986). Neural dynamics of adaptive sensory-motor control: Expanded edition. Elmsford, NY: Pergamon Press.Google Scholar
- Grossberg, S., Palma, J., & Versace, M. (2015). Resonant cholinergic dynamics in cognitive and motor decision-making: Attention, category learning, and choice in neocortex, superior colliculus, and optic tectum. Frontiers in Neuroscience: Decision Neuroscience. Retrieved from http://journal.frontiersin.org/article/10.3389/fnins.2015.00501/full
- Grossberg, S., & Pilly, P. K. (2014). Coordinated learning of grid cell and place cell spatial and temporal properties: Multiple scales, attention, and oscillations. Philosophical Transactions of the Royal Society B, 369, 20120524.Google Scholar
- Grossberg, S., Srinivasan, K., & Yazdanbakhsh, A. (2014). Binocular fusion and invariant category learning due to predictive remapping during scanning of a depthful scene with eye movements. Frontiers in Psychology: Perception Science. Retrieved from http://journal.frontiersin.org/Journal/10.3389/fpsyg.2014.01457/full
- Heitger, F., & von der Heydt, R. (1993). A computational model of neural contour processing: Figure–ground segregation and illusory contours. Proceedings of the 4th International Conference on Computer Vision (pp. 32–40). Berlin, Germany.Google Scholar
- Kalaska, J. F., Caminiti, R., & Georgopoulos, A. P. (1983). Cortical mechanisms related to the direction of two-dimensional arm movements: Relations in parietal area 5 and comparison with motor cortex. Experimental Brain Research, 51, 247-260.Google Scholar
- Kastner, S., & Ungerleider, L. G. (2001). The neural basis of biased competition in human visual cortex. Neuropsychologia, 22, 751–761.Google Scholar
- Kelly, F. J., & Grossberg, S. (2000). Neural dynamics of 3-D surface perception: Figure–ground separation and lightness perception. Perception & Psychophysics, 62, 1596–1619.Google Scholar
- Llinas, R., Ribary, U., Contreras, D., & Pedroarena, C. (1998). The neuronal basis for consciousness. Philosophical Transactions of the Royal Society of London B, 353, 1841–1849.Google Scholar
- Milner, A. D., & Goodale, M. A. (1995). The visual brain in action. Oxford, UK: Oxford University Press.Google Scholar
- Mishkin, M. (1982). A memory system in the monkey. Philosophical Transactions Royal Society of London B, 298, 85–95.Google Scholar
- Mishkin, M., Ungerleider, L. G., and Macko, K. A. (1983). Object vision and spatial vision: Two cortical pathways. Trends in Neurosciences, 6, 414–417.Google Scholar
- Morris, R. G. M., & Frey, U. (1997). Hippocampal synaptic plasticity: Role in spatial learning or the automatic recording of attended experience? Philosophical Transactions of the Royal Society of London B: Biological Sciences, 1360, 1469–1503.Google Scholar
- Nakamura, H., Kuroda, T., Wakita, M., Kusunoki, M., Kato, A., Mikami, A., . . . Itoh, K. (2001). From three-dimensional space vision to prehensile hand movements: The lateral intraparietal area links the area V3A and the anterior intraparietal area in Macaques. The Journal of Neuroscience, 21, 8174–8187.PubMedPubMedCentralGoogle Scholar
- Nakamura, K., & Colby, C. L. (2002). Updating of the visual representation in monkey striate and extrastriate cortex during saccades. Proceedings of the National Academy of Sciences, 99, 4026–4031.Google Scholar
- Olson, C. R., & Colby, C. L. (2013). Spatial cognition. In Fundamental Neuroscience (4th ed., pp. 969–988). London, UK: Elsevier.Google Scholar
- Piaget, J. (1945). La Formation du Symbole Chez L’enfant [Play, dreams and imitation in childhood]. Paris, France: Delachaux Niestle, S.A.Google Scholar
- Piaget, J. (1951). Play, dreams and imitation in childhood (C. Gattegno & C. F. M. Hodgson, Trans.). London, UK: Routledge and Kegan Paul.Google Scholar
- Piaget, J. (1952). The origins of intelligence in children. New York, NY: International Universities Press.Google Scholar
- Schiegg, A., Deubel, H., & Schneider, W. X. (2003). Attentional selection during preparation of prehension movements. Visual Cognition, 10, 409–431.Google Scholar
- Schneider, W. X., & Deubel, H. (2002). Selection-for-perception and selection-for spatial-motor-action are coupled by visual attention: A review of recent findings and new evidence from stimulus-driven saccade control. In W. Prinz & B. Hommel (Eds.), Attention and performance XIX: Common mechanisms in perception and action (pp. 609–627). Oxford, UK: Oxford University Press.Google Scholar
- Schwartz, B. J., & Sperling, G. (1983). Luminance controls the perceived 3-D structure of dynamic 2-D displays. Bulletin of the Psychonomic Society, 21, 456–458.Google Scholar
- Singer, W. (1998). Consciousness and the structure of neuronal representations. Philosophical Transactions of the Royal Society B, 353, 1829–1840.Google Scholar
- Snyder, L. H., Batista, A. P., & Andersen, R. A. (1997). Coding of intention in the posterior parietal cortex. Nature, 386, 167–170.Google Scholar
- Squire, L. R., & Cohen, N. J. (1984). Human memory and amnesia. In G. Lunch, J. McGaugh, & N. M. Weinberger (Eds.), Neurobiology of learning and memory (pp. 3–64). New York, NY: Guilford Press.Google Scholar
- Theeuwes, J., Mathot, S., & Kingstone, A. (2010). Object-based eye movements: The eyes prefer to stay within the same object. Attention, Perception , & Psychophysics, 72, 597–601.Google Scholar
- Varin, D. (1971). Fenomini di contrasto e diffusione chromatica nell organizzazone spaziale del campo percettivo. Revista di Psychologica, 65, 101–128.Google Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.