Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Human visual perceptual organization beats thinking on speed


What is the degree to which knowledge influences visual perceptual processes? This question, which is central to the seeing-versus-thinking debate in cognitive science, is often discussed using examples claimed to be proof of one stance or another. It has, however, also been muddled by the usage of different and unclear definitions of perception. Here, for the well-defined process of perceptual organization, I argue that including speed (or efficiency) into the equation opens a new perspective on the limits of top-down influences of thinking on seeing. While the input of the perceptual organization process may be modifiable and its output enrichable, the process itself seems so fast (or efficient) that thinking hardly has time to intrude and is effective mostly after the fact.


The seeing-versus-thinking debate in cognitive science has deep historical roots. One school of thinkers can be said to follow Leonardo da Vinci’s (1452–1519) motto “All our knowledge has its origins in perception”. This motto suggests that perception is a largely autonomous source of knowledge rather than that knowledge is a resource for perception (Firestone & Scholl, 2015; Fodor, 1983; Pylyshyn, 1999; Rock, 1985). This is also called cognitive impenetrability, which means that vision is largely unaffected by other cognitive domains—not because it is neurally encapsulated but because it is a stable process, wired into the brain and not easily modifiable by knowledge, beliefs, or intentions. This is typically illustrated by visual illusions, which persist even when we know what we are looking at. Another school can be said to follow William Kingdon Clifford (1845–1879), who restricted seeing to sensations and argued that a sensation gives us ideas connected with things because of earlier hands-on experience with things that caused this sensation too (Clifford, 1890). This suggests that thinking transforms sensory input directly into meaningful concepts. It reverberates in the recent idea that, in the visual hierarchyFootnote 1, all perceptually relevant information is represented by activity patterns in the primary, retinotopic, area V1 and that activity in other visual and non-visual areas is secondary or auxiliary but not representational (Gur, 2015).

Clifford’s idea does not include what is called the Høffding step. Harald Høffding (1843–1931) argued that there must be a stage of visual structuring, or perceptual organization, to transform a two-dimensional (2D) retinal image into a percept of three-dimensional (3D) objects arranged in space (Høffding, 1891). This idea attributes more autonomy to perception but also implies that a stimulus can be perceptually organized in many different ways. Therefore, the early 20th-century Gestaltists proposed the law of Prägnanz, which holds that the visual system settles in relatively stable organizations, characterized by symmetry and simplicity (Koffka, 1935; Köhler, 1929; Wertheimer, 1923). A modern version hereof is the simplicity principle. It defines the complexity of an organization by the amount of information needed to specify it, and holds that the visual system prefers simplest hierarchical organizations (Hochberg & McAlister, 1953; Leeuwenberg & van der Helm, 2013; van der Helm, 2014). It further postulates that the subsequent hierarchical levels in such an organization—say, from local features to global structures—are represented at subsequent levels in the visual hierarchy (van der Helm, 2012).

The Høffding step is also not included in Hermann von Helmholtz’ (1821–1894) idea that visual perception is a process of unconscious inference guided by the likelihood principle (von Helmholtz, 1909/1962). This principle holds that “we perceive the most likely objects or events that would fit the sensory pattern that we are trying to interpret” (Hochberg, 1978). It led, among other things, to the idea that the internal process of perception is veridical, meaning that it captures most truthfully the structure of the external world (e.g., Cohen, 2015; Pizlo, 2015). Visual illusions speak against this, and in fact, I think it is fundamentally unverifiable (see Appendix A). Be that as it may, the Helmholtzian likelihood principle is often taken as a permit to include knowledge in perception models. Some Bayesian models, for instance, test knowledge-based hypotheses against the sensory input (e.g., Friston, 2009). My problem with this is that it is basically a form of template matching, which, at least in human vision research, has been abandoned long ago because it is too rigid and limited to deal with ill-defined categories and novel objects.

The foregoing illustrates that, in the seeing-versus-thinking debate, much depends on how perception is defined. Definitions of perception range from only V1 activity to any cognitive activity that contributes to arriving at unique percepts. In both these extreme options, knowledge plays a large part in determining what we think we see when looking at a visual stimulus—be it (unconscious) phylogenetic knowledge acquired during evolution or (conscious) ontogenetic knowledge acquired during one’s life. However, whereas thinking seems to be a relatively slow process that has been described as involving the sequential activation of sets of neural assemblies (Hebb, 1949), we can detect stimulus features like mirror symmetry under presentation times as short as 50 ms (Csathó, van der Vloed, & van der Helm, 2003; Locher & Wagemans, 1993), while complete percepts seem to be formed within less than 500 ms (Breitmeyer & Ogmen, 2006; Sekuler & Palmer, 1992). These temporal specifications are, admittedly, not necessarily indicative of temporal aspects of the cascade of perceptual processes triggered by a stimulus, but they do suggest that, for thinking to intrude into seeing, it might have a timing problem.

To explore this issue further, this article focuses on the process of perceptual organization. As indicated, perceptual organization is the neuro-cognitive process—in the visual hierarchy—that enables us to perceive scenes as structured wholes consisting of objects arranged in space (Fig. 1ab). This includes the perception of randomly organized spatial elements as well as elements that can be organized, in 2D or 3D, into a single object, multiple objects, partially hidden ones, etc. This presumably automatic process may seem to occur effortlessly in daily life, but by all accounts, it must be both complex and flexible. For a proximal stimulus, the perceptual organization process usually singles out one hypothesis about the distal stimulus from among a myriad of hypotheses that also would fit the proximal stimulus. This means, as Gray (1999) put it, that multiple sets of features at multiple, sometimes overlapping, locations in a stimulus must be grouped in parallel and that the process must cope with a large number of possible combinations simultaneously. This indicates that the combinatorial capacity of the perceptual organization process must be high, which is remarkable considering that it completes in just a few hundreds of milliseconds.

Fig. 1

Perceptual organization. a A stimulus with a typically perceived organization comprising the two triangular shapes in b, which, therefore, are called compatible parts. c An incompatible part, which is masked by the typically perceived organization and which, therefore, is called an embedded figure. (After Kastens & Ishikawa, 2006)

Perception is, admittedly, broader than perceptual organization, but the latter is a pivotal process between sensory input and percepts, so, it is relevant to explore how it might interact with top-down processes. In the next sections, I first sketch an earlier-presented model of perceptual organization, called PATVISH (Perception and ATtention in the VISual Hierarchy; for details, see van der Helm, 2012, 2014, 2016). Using this model, I then argue that the input of the perceptual organization process may be modifiable and its output enrichable, but that the process itself is so fast (or efficient) that it has done most of its job by the time thinking might interfere. By “most”, I mean that the perceptual organization process is not neurally encapsulated and that thinking might have time to intrude—but not much. Regarding the exact degree to which thinking might intrude, this study remains speculative, but its main objective nevertheless is to put speed (or efficiency) forward as a relevant factor in the seeing-versus-thinking debate.

Before I begin, two remarks are in order. First, theoretical studies aim to integrate empirical findings and theoretical ideas into coherent frameworks or to apply such a framework to address topical issues. PATVISH represents a proposed integration of ideas that have gained some sort of support (empirical or otherwise), and in this theoretical study, I apply this proposal to address issues in the seeing-versus-thinking debate. Theoretical research is not empirical research but is yet an integral part of the empirical cycle, and at the end of this article, I raise several empirical questions for future investigation. Second, a semantic problem in the seeing-versus-thinking debate is that thinking, or knowledge, is often discussed in terms of attentional effects, even though thinking and attention are not the same. Yet, attention seems the obvious channel through which thinking would affect seeing, and here, I therefore focus on effects of attention on perceptual organization. Through such effects, if any, one might infer effects of thinking.

Modeling perceptual organization

To account for the high combinatorial capacity of the perceptual organization process, PATVISH follows Lamme, Supèr, and Spekreijse (1998) in assuming that this distributed hierarchical process (Footnote 1) comprises three neurally intertwined but functionally distinguishable subprocesses. These subprocesses are taken to be responsible for (a) feedforward extraction of, or tuning to, features to which the visual system is sensitive, (b) horizontal binding of similar features, and (c) recurrent selection and integration of different features (Fig. 2, left-hand panel). Furthermore, adopting the simplicity principle, PATVISH assumes that the process yields a complexity distribution over candidate organizations (i.e., stimulus organizations in terms of whole and parts; Fig. 2, right-hand panel). Such a complexity C can be converted into a normalized probability 2C, which reflects an organization’s probability of being perceived and implies that simpler organizations are more likely to be perceived.

Fig. 2

Processing in the visual hierarchy. A stimulus-driven perceptual organization process (at the left) comprises three intertwined subprocesses (further explained in the text), which, together, yield percepts in the form of hierarchical stimulus organizations (i.e., organizations in terms of wholes and their parts). A task-driven attention process (at the right) may scrutinize such a hierarchical organization—starting at higher levels where relatively global structures are represented, and if required by task and allowed by time, descending to lower levels where relatively local features are represented. (Reproduced from van der Helm, 2016)

The subprocess of feedforward extraction is reminiscent of the neuroscientific idea that, going up in the visual hierarchy, neural cells mediate detection of increasingly complex features (Hubel & Wiesel, 1968). Furthermore, the subprocess of recurrent selection and integration is reminiscent of the connectionist idea that, by parallel distributed processing (PDP), neural activation spreading yields percepts represented by stable activation patterns (Churchland, 1986). In PATVISH, these subprocesses interact like a fountain under increasing water pressure: As the feedforward extraction progresses along ascending connections, each passed level in the visual hierarchy forms the starting point of integrative recurrent processing along descending connections. This yields a gradual buildup from percepts of parts at lower levels in the visual hierarchy to percepts of wholes near its top end (for similar pictures, see Lee & Mumford, 2003; VanRullen & Thorpe, 2002).

Neural activation spreading as presumed manifestation of standard PDP may be a basic phenomenon in the brain, but more sophisticated, the brain also exhibits neuronal synchronization (see Appendix B). This is the phenomenon that neurons, in transient assemblies, temporarily synchronize their firing activity. Synchronization in the 30–70 Hz gamma band, in particular, has been associated with local visual computations, especially with feature binding in horizontal neural assemblies (Gilbert, 1992). PATVISH’s capstone now is its assumption that gamma synchronization is a manifestation of transparallel processing, which means that up to an exponential number of similar features are processed in one go, that is, simultaneously as if only one feature were concerned. The source of this assumption is sketched next (see Appendix C for more technical details).

In PISA—a minimal coding algorithm for strings (van der Helm, 2004, 2015)—regularities such as symmetries and repetitions are extracted to compute simplest hierarchical organizations. To this end, the algorithm implements formal counterparts of the three intertwined but functionally distinguishable subprocesses that are believed to take place in the visual hierarchy. Horizontal binding of similar features, in particular, is implemented by gathering sets of up to an exponential number of similar regularities in special distributed representations, called hyperstrings (see Appendix C). A hyperstring represents those regularities in such a way that they, for all intents and purposes in minimal coding, can be processed further as if they constituted a single regularity. This means that those regularities can be hierarchically recoded in a transparallel fashion, that is, simultaneously as if only one regularity were concerned—thus solving the computationally heavy combinatorial search for simplest hierarchical organizations. This led to the idea that hyperstrings can be seen as formal counterparts of those temporarily synchronized neural assemblies, so that, inversely, synchronization in those transient assemblies can be seen as a manifestation of transparallel processing. Notice that, unlike standard PDP, transparallel processing by hyperstrings is feasible on classical computers, giving them (for some computing tasks) the same extraordinary computing power as that promised by quantum computers (for some other computing tasks; see van der Helm, 2015).

The transparallel recoding of similar features yields a hierarchy of feature constellations—that is, in PISA, a hierarchy of hyperstrings, and in PATVISH, a hierarchy of synchronized neural assemblies. From this hierarchy of feature constellations, different features are selected to be integrated into percepts. Thus, transparallel processing underlies the perceptual integration capability—as distinct from the feedforward extraction of visual features. This distinction between extraction and integration agrees with that between base-grouping and incremental grouping as put forward by Roelfsema (2006; see also Lamme & Roelfsema, 2000; Roelfsema & Houtkamp, 2011), who, however, did not provide a computational account like transparallel processing.

To give a sense of the timing of these processes, the so-called fast feedforward sweep reaches the top end of the visual hierarchy in about 100 ms (Lamme & Roelfsema, 2000; Tovee, 1994). In some cases, this feedforward sweep may be sufficient to detect particular features. For instance, to discriminate between two clear-cut categories—say, animated versus inanimated structures, or rural versus city structures—holistic spatial organizations are not needed because one can rely on a large variety of local features to quickly complete the task (cf. Kirchner & Thorpe, 2006). Feature conjunctions, however, require more than that. For instance, binocular depth information kicks in around 100–200-ms post-stimulus onset (Ritter, 1980). Furthermore, Makin et al. (2016) investigated detection of single and multiple symmetries, repetitions, and Glass patterns, in fairly simple multi-element stimuli. They recorded the sustained posterior negativity (SPN)—an event-related potential (ERP) generated by visual regularities—and found that it correlates highly with behavioural data, particularly around 300-400-ms post-stimulus onset. It is therefore also plausible that processes manifesting synchronization play a part in this—after all, synchronization arises around 150–400-ms post-stimulus onset (Kveraga et al., 2011; Tallon-Baudry & Bertrand, 1999).

In sum, by PATVISH, the process of perceptual organization comprises a gradual buildup—through successive groupings (cf. Palmer, Brooks, & Nelson, 2003) with feedback loops (cf. Lee & Mumford, 2003)—from percepts of parts to percepts of wholes. Such a gradual buildup takes time, so, in principle, it leaves room for top-down processes to intrude and modulate things before a percept has completed. In this sense, PATVISH does not exclude influences from higher cognitive levels. However, it also postulates that—due to transparallel processing—the perceptual organization process is so fast (or efficient) that it, by then, already has done most of its job. This opens a new perspective on the limits of top-down influences on perception. Next, I discuss several implications.

Receptive fields

The perceptual organization process is not neurally encapsulated in the visual hierarchy, but for the moment, suppose it is. Even then, PATVISH implies that it involves top-down processing, namely, by the subprocess of recurrent selection and integration. This subprocess takes pieces of information from a lower level, integrates them at a higher level, and feeds information about the result back to update the lower level (Lee & Mumford, 2003). This has consequences for what is called a neuron’s receptive field (RF).

The classical receptive field (cRF) of a neuron is defined by the region of the retina to which the neuron is connected by way of feedforward connections (Hubel & Wiesel, 1968). Going up in the visual hierarchy, cRFs increase in size, which suggests that neurons at any level in the visual hierarchy can be conceived of as feature detectors, the output of which is simply summed by neurons with larger cRFs at the next level. This also suggests that vision involves only the fast feedforward sweep. However, via horizontal and recurrent connections, a neuron also receives input from neurons at the same and higher levels in the visual hierarchy. This suggests that a neuron is context sensitive, that is, responsive to local features outside its cRF and global features extending beyond its cRF. This context sensitivity—which does not rely on input from higher cognitive levels beyond the visual hierarchy—is not only implied by PATVISH but also supported by neuroscientific evidence (Gilbert, 1992; Lamme et al., 1998; Self et al., 2016; Smith & Muckli, 2010; Vetter, Smith, & Muckli, 2014). To be clear, I think that the cRF remains a useful concept in neuroscientific settings. The foregoing suggests, however, that its definition is too limited to capture a neuron’s effective RF in cognitive settings.


In behavioral perception experiments, participants respond to a task on the basis of what they think they saw. Hence, responses are based on perception in combination with task-driven top-down attention. Various forms of attention have been distinguishedFootnote 2 but notice that attention—of whatever form and involving whatever action—is basically the allocation of processing resources (Anderson, 2004). This may imply an enhancement of stimulus information focused on (cf. Nandy, Nassi, & Reynolds, 2017), but it neither prescribes how this information is (or has been) organized nor how it interacts with information outside the focus of attention. For instance, you must have perceived a bright flash before your attention is drawn by it. Furthermore, even if attention is directed specifically to stimulus parts relevant to a task, other stimulus parts may still affect responses to this task (e.g., Palmer & Hemenway, 1978; van der Helm & Treder, 2009).

PATVISH leaves room for attention to have measurable effects throughout the visual hierarchy—for instance, related to preparatory arrangements regarding what is focused on (Self et al., 2016). Its focus, however, is on the processing of stimulus information, and in this context, it postulates that attention also scrutinizes established perceptual organizations in a top-down fashion (Fig. 2, right-hand panel). This means that it starts with global structures represented at higher levels in the visual hierarchy, and if required by task and allowed by time, may descend to local features represented at lower levels. This agrees with reverse hierarchy theory (RHT) as proposed by Hochstein and Ahissar (2002; see also Ahissar & Hochstein, 2004; Wolfe, 2007; for neurophysiological evidence, see Campana et al., 2016). RHT, by the way, focuses mainly on the attention side, and unlike PATVISH does, less on processing details at the perception side.

The combination of perceptual organization and attention obviates the idea that perception comprises V1 activity only (Gur, 2015). Inspired by the fact that we can be aware of details, this idea relied on the preservation of details represented in V1. However, the foregoing implies that details are preserved and attainable also if perception is taken to comprise more than just V1 activity. It also agrees with findings that figure-ground segregation—which is part of perceptual organization—can take place outside the focus of attention (i.e., independently of attention, or preattentively), and that not attention but the figure-ground assignment itself is responsible for an enhancement of figures relative to grounds (Hecht, Cosman, & Vecera, 2016; Kimchi & Peterson, 2008). Because of this enhancement, attention may subsequently be drawn more to figures than to grounds (Nelson & Palmer, 2007), but the point is that, by then, the figure-ground segregation already has done its job. Next, I discuss three implications to further illustrate that perceptual organization supplies, fairly autonomously, input for top-down attention.

First, in visual search, a “pop-out” is a target that is detected fast and independently of the number of distractors (e.g., a red item among blue items; Treisman & Gelade, 1980). However, a target is a pop-out not by its own merits but by the merits of the distractors: The search for a target is easier as the distractors are more similar to each other and more different from the target (Duncan & Humphreys, 1989; Wolfe, 2007). Hence, for a target to be detected, properties of all elements have to be processed first (for evidence, see Conci, Toellner, Leszczynski, & Müller, 2011; Conci, Müller, & von Mühlenen, 2013), which may well involve lateral inhibition among similar things so that the target rises above the distractors. As argued in van der Helm (2016), it is therefore plausible that the similarity of the distractors is represented first in lower visual areas and that the representation of the target ends up in higher visual areas. This suggests that a pop-out is a pop-out not because it is (unconsciously) processed first by perceptual processes but because its representation ends up in higher visual areas so that it is among the first things (consciously) encountered by top-down attentional processes.

Second, whereas perceptual organization logically processes parts before wholes, the top-down attentional scrutiny of hierarchical organizations implies that wholes are experienced before parts. The latter explains the phenomenon of global dominance as postulated by the early 20th-century Gestaltists (Koffka, 1935; Köhler, 1929; Wertheimer, 1923). This is the phenomenon that, perceptually, global structures are more important than local features. For instance, we typically classify things on the basis of their perceived global structures rather than on the basis of their physical local features, and their perceived global structures determine which local features we perceive as their parts. This global dominance has been confirmed in behavioural studies (for a review, see Wagemans et al., 2012), in which it has been specified further by notions such as global precedence (Navon, 1977), configural superiority (Pomerantz, Sager, & Stoever, 1977), primacy of holistic properties (Kimchi, 1992), and superstructure dominance (Leeuwenberg & van der Helm, 1991; Leeuwenberg, van der Helm, & van Lier, 1994). It also agrees with Hochstein and Ahissar’s (2002) RHT and Campana et al.’s (2016) neurophysiological evidence.

Third, what if the perceptual integration of local features into global structures is hampered? By PATVISH, this could be caused by impaired gamma synchronization, as, for instance, found in autism spectrum disordersFootnote 3 (ASD) (Grice et al., 2001; Maxwell et al., 2015; Sun et al., 2012). Then, top-down attention will hardly encounter perceived global structures and will have better access to embedded figures (Fig. 1ac), that is, to local features that are incompatible with typically perceived global structures (van der Helm, 2016). Better than typical access to embedded figures is exactly what has been found in ASD (Frith, 1989; Jolliffe & Baron-Cohen, 1997; Shah & Frith, 1983).

Perceptual organization and thinking

If standard PDP were the only form of processing in the brain, then everything would influence everything, and seeing and thinking would be inextricable. Synchronization in transient neural assemblies changes the game, however. Higher cognitive functions seem to be mediated by processes manifesting synchronization involving relatively slow oscillations in the 4–30-Hz theta, alpha, and beta bands, whereas perceptual organization seems to be mediated by processes manifesting synchronization involving relatively fast oscillations in the 30–70-Hz gamma band (see Appendix B). By PATVISH, this fast gamma synchronization is a manifestation of transparallel processing, which, in classical computers, has the same extraordinary computer power as that promised by quantum computers. The foregoing implies that it is plausible to make a functional distinction between fairly autonomous perceptual organization and higher cognitive functions.

This functional distinction does not mean that they do not cooperate. After all, as indicated, there are both preperceptual and postperceptual effects of attention. Perceptual organization, however, is like an Olympic 100-m sprint: it may involve preparation beforehand and scrutiny afterwards, but the sprint itself is over in a jiffy. For instance, because it is perceptual organization that organizes scenes into objects arranged in space, objects are the output of perception, not the input—so, object-based attention can only be postperceptual. Furthermore, by PATVISH, candidate organizations are assigned complexity-based probabilities of being perceived. Thus, different organizations may have nearly the same probability of being perceived, which holds, in particular, for visually ambiguous or bistable figures. Then, prolonged viewing or a shift in focus may trigger a switch between such organizations (Suzuki & Peterson, 2000), but notice that these organizations and their bistability had already been supplied by perceptual organization.

The latter illustrates that perceptual organization provides the starting point for subsequent cognitive structuring of, among other things, attention, generalization, learning, and memory (Conci et al., 2013; Kanizsa, 1985; Rock, 1985). For instance, remaining perceptual ambiguities may be resolved by heuristic knowledge such as: light usually comes from above, objects usually are viewed from above, surfaces are usually convex, etc. Furthermore, knowledge can be invoked to recognize the objects supplied by perceptual organization and to enrich percepts to the level of what we call seeing in everyday life. For instance, “I see a chair” is actually short for “I see an object, and based on knowledge, I recognize it as something one can sit on”. This illustrates that the fast and unconscious process that makes you see the object is perceptual organization, while the rest is relatively slow conscious thinking. All in all, I think that perceptual organization is a fairly autonomous process, which, by and large, is unaffected by thinking.

It is true that this article started from a specific model of perceptual organization, but speed (or efficiency) may also be a critical factor in other neuro-cognitive models, and I hope this article stimulates further research into this. Furthermore, this article surely does not settle the seeing-versus-thinking debate. On the one hand, it shows that various (especially early) effects of attention in the visual hierarchy are not necessarily effects on perceptual organization. On the other hand, Peterson and coworkers, for instance, reported evidence for effects of object recognition, memory, and past experience on figure-ground perception (see, e.g., Peterson & Gibson, 1994; Trujillo et al., 2010). Such evidence has to be taken seriously, although it cannot be said to prove cognitive penetrability. Peterson and Gibson (1994, p. 561), for instance, pointed out that the orientation dependence of their results demonstrates that their phenomena are not dependent on semantic knowledge. Furthermore, Firestone and Scholl (2015) argued that such effects merely reflect an increasing sensitivity over time to certain visual features and do not involve effects of knowledge per se. In other words, just as preparatory attentional arrangements, such effects may apply to the input of the perceptual organization process, but not necessarily to what this process does with the input it receives. Trujillo et al. (2010), for instance, found effects of past experience in early (106–156 ms) ERPs but not in the figural outcomes. Be that as it may, further research certainly is needed, and based on this article, this might be guided by, for instance, the next three questions.

First, transparallel processing is an extraordinarily powerful form of processing that is feasible in classical computers, but does it indeed also underlie gamma synchronization in the visual hierarchy? Further investigation into this question might focus on feature binding in horizontal neural assemblies, which has been associated with gamma synchronization. By PATVISH, this subprocess is a crucial part of visual processing, but thus far, it has been a relatively underexposed topic in cognitive neuroscience.

Second, embedded figures are local features that are incompatible with typically perceived global structures (Fig. 1), and the phenomenon that ASD individuals are better than typical individuals in detecting them has been attributed to either enhanced local processing (Mottron & Burack, 2001) or reduced global processing (Frith, 1989). A critical question then is: are ASD individuals better also in detecting compatible features? Enhanced local processing implies they are, whereas by PATVISH, reduced global processing implies they are not (van der Helm, 2016).

Third, microgenetic analyses on amodal completion, for instance, have shown that the domain of perception lies within the first 500 ms after stimulus onset (Sekuler & Palmer, 1992; see also Breitmeyer & Ogmen, 2006). By PATVISH, higher cognitive functions take over after that, but are transitional changes visible in, for instance, electroencephalographic data? In research on multiple symmetry perception, a first indication hereof (Makin et al., 2016) helped to reconcile seemingly opposed ideas that actually hold for different time windows after stimulus onset (see Hamada et al., 2016).


The seeing-versus-thinking debate in cognitive science has been muddled by different and unclear definitions of perception. Therefore, this article focused on the well-defined perceptual process that organizes scenes into objects arranged in space, which is a pivotal process between sensory input and percepts. In this perceptual organization process, as modeled here, similar features are hierarchically recoded extremely efficiently, so that the whole process arrives quickly at hierarchical organizations likely to be perceived. Several factors may modify the input of this process but not necessarily what the process does with the input it receives. An important role of attention, in particular, seems to be top-down scrutiny of already-established hierarchical organizations, that is, starting with global structures, and if required by task and allowed by time, descending to local features. Furthermore, thinking processes may, of course, enrich the outcome of the perceptual organization process, but they are relatively slow and can therefore hardly intrude into the process itself. In other words, we think about what we see rather than that we see what we think.


  1. 1.

    The visual hierarchy is a cognitive structure in the brain that begins in V1 and that, at its top end, merges into higher cognitive structures. V1 receives retinal input via the lateral geniculate nucleus and its information bifurcates, via higher visual areas, into ventral and dorsal streams dedicated to object perception and spatial perception, respectively (Ungerleider & Mishkin, 1982). The neural network in the visual hierarchy is organized with 10–14 distinguishable hierarchical levels (with multiple distinguishable areas within each level), contains many short-range and long-range connections (both within and between levels), and can be said to perform distributed hierarchical processing (Felleman & van Essen, 1991).

  2. 2.

    For instance, distinctions have been made between selective and divided attention (i.e., concentrated on a specific thing vs. divided over several things); between overt and covert attention (i.e., actively directed gaze vs. purely mental focus); and between exogenous bottom-up and endogenous top-down attention (i.e., drawn by stimuli like a bright flash vs. directed to stimuli in function of a task).

  3. 3.

    Autism spectrum disorders (ASD) are complex neurodevelopmental disorders, the severity of which is based on social communication impairments and restricted repetitive patterns of behavior (American Psychiatric Association, 2013). In addition to these diagnostic features, ASD individuals show atypical cognitive processing, particularly in the visual domain (Dakin & Frith, 2005; Simmons et al., 2009).


  1. Ahissar, M., & Hochstein, S. (2004). The reverse hierarchy theory of visual perceptual learning. Trends in Cognitive Sciences, 8, 457–464. doi:10.1016/j.tics.2004.08.011.

  2. American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders, 5th edn. Arlington, VA: American Psychiatric Publishing.

  3. Anderson, J. R. (2004). Cognitive psychology and its implications. Richmond, UK: Worth Publishers.

  4. Bayes, T., & Price, R. (1763). An essay towards solving a problem in the doctrine of chances. Philosophical Transactions, 53, 370–418. doi:10.1098/rstl.1763.0053.

  5. Bertrand, J. L. (1889). Calcul des probabilités [Calculation of probabilities]. Paris: Gauthier-Villars.

  6. Bojak, I., & Liley, D. T. J. (2007). Self-organized 40-Hz synchronization in a physiological theory of EEG. Neurocomputing, 70, 2085–2090. doi:10.1016/j.neucom.2006.10.087.

  7. Breitmeyer, B. G., & Ogmen, H. (2006). Visual masking: Time slices through conscious and unconscious vision. Oxford, UK: Oxford University Press.

  8. Campana, F., Rebollo, I., Urai, A., Wyart, V., & Tallon-Baudry, C. (2016). Conscious vision proceeds from global to local content in goal-directed tasks and spontaneous vision. Journal of Neuroscience, 36, 5200–5216. doi:10.1523/JNEUROSCI.3619-15.2016.

  9. Churchland, P. S. (1986). Neurophilosophy. Cambridge, MA: MIT Press.

  10. Clifford, W. K. (1890). Seeing and thinking. London: Macmillan and Co.

  11. Cohen, J. (2015). Perceptual representation, veridicality, and the interface theory of perception. Psychonomic Bulletin and Review, 22, 1512–1518. doi:10.3758/s13423-014-0782-3.

  12. Conci, M., Müller, H. J., & von Mühlenen, A. (2013). Object-based implicit learning in visual search: Perceptual segmentation constrains contextual cueing. Journal of Vision, 13, 1–17. doi:10.1167/13.3.15.

  13. Conci, M., Toellner, T., Leszczynski, M., & Müller, H. J. (2011). The time-course of global and local attentional guidance in Kanizsa-figure detection. Neuropsychologia, 49, 2456–2464. doi:10.1016/j.neuropsychologia.2011.04.023.

  14. Csathó, Á., van der Vloed, G., & van der Helm, P. (2003). Blobs strengthen repetition but weaken symmetry. Vision Research, 43, 993–1007. doi:10.1016/S0042-6989(03)00073-7.

  15. Dakin, S., & Frith, U. (2005). Vagaries of visual perception in autism. Neuron, 48, 497–507. doi:10.1016/j.neuron.2005.10.018.

  16. Duncan, J., & Humphreys, G. W. (1989). Visual search and stimulus similarity. Psychological Review, 96, 433–458. doi:10.1037/0033-295X.96.3.433.

  17. Eckhorn, R., Bauer, R., Jordan, W., Brosch, M., Kruse, W., Munk, M., & Reitboeck, H. J. (1988). Coherent oscillations: A mechanisms of feature linking in the visual cortex? Biological Cybernetics, 60, 121–130. doi:10.1007/BF00202899.

  18. Eckhorn, R., Bruns, A., Saam, M., Gail, A., Gabriel, A., & Brinksmeyer, H. J. (2001). Flexible cortical gamma-band correlations suggest neural principles of visual processing. Visual Cognition, 8, 519–530. doi:10.1080/13506280143000098.

  19. Edelman, G. M. (1987). Neural Darwinism: The theory of neuronal group selection. New York: Basic Books.

  20. Feldman, J. (2013). Tuning your priors to the world. Topics in Cognitive Science, 5, 13–34. doi:10.1111/tops.12003.

  21. Felleman, D. J., & van Essen, D. C. (1991). Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex, 1, 1–47. doi:10.1093/cercor/1.1.1.

  22. Firestone, C., & Scholl, B. (2015). Cognition does not affect perception: Evaluating the evidence for “top-down” effects. Behavioral and Brain Sciences. doi:10.1017/S0140525X15000965

  23. Fodor, J. (1983). The modularity of mind. Cambridge, MA: MIT Press.

  24. Fries, P., Roelfsema, P. R., Engel, A. K., König, P., & Singer, W. (1997). Synchronization of oscillatory responses in visual cortex correlates with perception in interocular rivalry. Proceedings of the National Academy of Sciences USA, 94, 12699–12704. doi:10.1073/pnas.94.23.12699.

  25. Friston, K. (2009). The free-energy principle: A rough guide to the brain? Trends in Cognitive Sciences, 13, 293–301. doi:10.1016/j.tics.2009.04.005.

  26. Frith, U. (1989). Autism: Explaining the enigma. Oxford, UK: Basil Blackwell.

  27. Gilbert, C. D. (1992). Horizontal integration and cortical dynamics. Neuron, 9, 1–13. doi:10.1016/0896-6273(92)90215-Y.

  28. Gray, C. M. (1999). The temporal correlation hypothesis of visual feature integration: Still alive and well. Neuron, 24, 31–47. doi:10.1016/S0896-6273(00)80820-X.

  29. Gray, C. M., & Singer, W. (1989). Stimulus-specific neuronal oscillations in orientation columns of cat visual cortex. Proceedings of the National Academy of Sciences USA, 86, 1698–1702. doi:10.1073/pnas.86.5.1698.

  30. Grice, S. J., Spratling, M. W., Karmiloff-Smith, A., Halit, H., Csibra, G., de Haan, M., & Johnson, M. H. (2001). Disordered visual processing and oscillatory brain activity in autism and Williams syndrome. NeuroReport, 12, 2697–2700. doi:10.1097/00001756-200108280-00021.

  31. Gur, M. (2015). Space reconstruction by primary visual cortex activity: A parallel, non-computational mechanism of object representation. Trends in Neurosciences, 38, 207–216. doi:10.1016/j.tins.2015.02.005.

  32. Hamada, J., Amano, K., Fukuda, S. T., Uchiumi, C., Fukushi, K., & van der Helm, P. A. (2016). A group theoretical model of symmetry cognition. Acta Psychologica, 171, 128–137. doi:10.1016/j.actpsy.2016.10.002.

  33. Hebb, D. O. (1949). The organization of behavior. New York: Wiley.

  34. Hecht, L. N., Cosman, J. D., & Vecera, S. P. (2016). Enhanced spatial resolution on figures versus grounds. Attention, Perception, and Psychophysics. doi:10.3758/s13414-016-1099-2

  35. Hochberg, J. E. (1978). Perception, 2nd edn. Englewood Cliffs, NJ: Prentice-Hall.

  36. Hochberg, J. E., & McAlister, E. (1953). A quantitative approach to figural “goodness”. Journal of Experimental Psychology, 46, 361–364. doi:10.1037/h0055809.

  37. Hochstein, S., & Ahissar, M. (2002). View from the top: Hierarchies and reverse hierarchies in the visual system. Neuron, 36, 791–804. doi:10.1016/S0896-6273(02)01091-7.

  38. Höffding, H. (1891). Psychische und physische Activität [Mental and physical activity]. Vierteljahrsschrift für wissenschaftliche Philosophie, 15, 233–250.

  39. Hoffman, D. D., Singh, M., & Prakash, C. (2015). The interface theory of perception. Psychonomic Bulletin and Review, 22, 1480–1506. doi:10.3758/s13423-015-0890-8.

  40. Hubel, D. H., & Wiesel, T. N. (1968). Receptive fields and functional architecture of monkey striate cortex. Journal of Physiology (London), 195, 215–243. doi:10.1113/jphysiol.1968.sp008455.

  41. Jolliffe, T., & Baron-Cohen, S. J. (1997). Are people with autism and Asperger syndrome faster than normal on the Embedded Figures Test? Journal of Child Psychology and Psychiatry, 38, 527–534. doi:10.1111/j.1469-7610.1997.tb01539.x.

  42. Kahana, M. J. (2006). The cognitive correlates of human brain oscillations. Journal of Neuroscience, 26, 1669–1672. doi:10.1523/JNEUROSCI.3737-05c.2006.

  43. Kanizsa, G. (1985). Seeing and thinking. Acta Psychologica, 59, 23–33. doi:10.1016/0001-6918(85)90040-X.

  44. Kastens, K. A., & Ishikawa, T. (2006). Spatial thinking in the geosciences and cognitive sciences: A cross-disciplinary look at the intersection of the two fields. Geological Society of America Special Papers, 413, 53–76. doi:10.1130/2006.2413(05).

  45. Keil, A., Muller, E. M., Ray, W. J., Gruber, T., & Elbert, T. (1999). Human gamma band activity and perception of a Gestalt. Journal of Neuroscience, 19, 7152–7161.

  46. Kimchi, R. (1992). Primacy of holistic processing and global/local paradigm: A critical review. Psychological Bulletin, 112, 24–38. doi:10.1037/0033-2909.112.1.24.

  47. Kimchi, R., & Peterson, M. A. (2008). Figure-ground segmentation can occur without attention. Psychological Science, 19, 660–668. doi:10.1111/j.1467-9280.2008.02140.x.

  48. Kirchner, H., & Thorpe, S. J (2006). Ultra-rapid object detection with saccadic eye movements: Visual processing speed revisited. Vision Research, 46, 1762–1776. doi:10.1016/j.visres.2005.10.002.

  49. Koffka, K. (1935). Principles of Gestalt psychology. London: Routledge and Kegan Paul.

  50. Köhler, W. (1929). Gestalt psychology. New York: Liveright.

  51. Kopell, N., Ermentrout, G. B., Whittington, M. A., & Traub, R. D. (2000). Gamma rhythms and beta rhythms have different synchronization properties. Proceedings of the National Academy of Sciences USA, 97, 1867–1872. doi:10.1073/pnas.97.4.1867.

  52. Kveraga, K., Ghuman, A. S., Kassam, K. S., Aminoff, E. A., Hämäläinen, M. S., Chaumon, M., & Bar, M. (2011). Early onset of neural synchronization in the contextual associations network. Proceedings of the National Academy of Sciences USA, 108, 3389–3394. doi:10.1073/pnas.1013760108.

  53. Lee, T. S., & Mumford, D. (2003). Hierarchical Bayesian inference in the visual cortex. Journal of the Optical Society of America A, 20, 1434–1448. doi:10.1364/JOSAA.20.001434.

  54. Leeuwenberg, E. L. J., & van der Helm, P. (1991). Unity and variety in visual form. Perception, 20, 595–622. doi:10.1068/p200595.

  55. Leeuwenberg, E. L. J., & van der Helm, P. A. (2013). Structural information theory: The simplicity of visual form. Cambridge, UK: Cambridge University Press.

  56. Leeuwenberg, E. L. J., van der Helm, P. A., & van Lier, R. J. (1994). From geons to structure: A note on object classification. Perception, 23, 505–515. doi:10.1068/p230505.

  57. Lamme, V. A. F., & Roelfsema, P. R. (2000). The distinct modes of vision offered by feedforward and recurrent processing. Trends in Neuroscience, 23, 571–579. doi:10.1016/S0166-2236(00)01657-X

  58. Lamme, V. A. F., Supèr, H., & Spekreijse, H. (1998). Feedforward, horizontal, and feedback processing in the visual cortex. Current Opinion in Neurobiology, 8, 529–535. doi:10.1016/S0959-4388(98)80042-1.

  59. Locher, P., & Wagemans, J. (1993). Effects of element type and spatial grouping on symmetry detection. Perception, 22, 565–587. doi:10.1068/p220565.

  60. Lu, H. J., Morrison, R. G., Hummel, J. E., & Holyoak, K. J. (2006). Role of gamma-band synchronization in priming of form discrimination for multiobject displays. Journal of Experimental Psychology: Human Perception and Performance, 32, 610–617. doi:10.1037/0096-1523.32.3.610.

  61. Makin, A. D. J., Wright, D., Rampone, G., Palumbo, L., Guest, M., Sheehan, R., Cleaver, H., & Bertamini, M. (2016). An electrophysiological index of perceptual goodness. Cerebral Cortex. doi:10.1093/cercor/bhw255

  62. Maxwell, C. R., Villalobos, M. E., Schultz, R. T., Herpertz-Dahlmann, B., Konrad, K., & Kohls, G. (2015). Atypical laterality of resting gamma oscillations in autism spectrum disorders. Journal of Autism and Developmental Disorders, 45, 292–297. doi:10.1007/s10803-013-1842-7.

  63. Milner, P. (1974). A model for visual shape recognition. Psychological Review, 81, 521–535. doi:10.1037/h0037149.

  64. Mottron, L., & Burack, J. A. (2001). Enhanced perceptual functioning in the development of autism. In Burack, J. A., Charman, T., Yirmiya, N., & Zelazo, P. R. (Eds.), The Development of Autism: Perspectives from theory and research (pp. 131–148). Mahwah, NJ: Erlbaum.

  65. Nandy, A. S., Nassi, J. J., & Reynolds, J. H. (2017). Laminar organization of attentional modulation in macaque visual area V4. Neuron, 93, 235–246. doi:10.1016/j.neuron.2016.11.029.

  66. Navon, D. (1977). Forest before trees: The precedence of global features in visual perception. Cognitive Psychology, 9, 353–383. doi:10.1016/0010-0285(77)90012-3.

  67. Nelson, R. A., & Palmer, S. E. (2007). Familiar shapes attract attention in figure-ground displays. Perception and Psychophysics, 69, 382–392. doi:10.3758/BF03193759.

  68. Palmer, S. E., Brooks, J. L., & Nelson, R. (2003). When does grouping happen? Acta Psychologica, 114, 311–330. doi:10.1016/j.actpsy.2003.06.003.

  69. Palmer, S. E., & Hemenway, K. (1978). Orientation and symmetry: Effects of multiple, rotational, and near symmetries. Journal of Experimental Psychology: Human Perception and Performance, 4, 691–702. doi:10.1037/0096-1523.4.4.691.

  70. Peterson, M. A., & Gibson, B. S. (1994). Object recognition contributions to figure-ground organization: Operations on outlines and subjective contours. Perception and Psychophysics, 56, 551–564. doi:10.3758/BF03206951.

  71. Pizlo, Z. (2015). Philosophizing cannot substitute for experimentation: Comment on Hoffman, Singh and Prakash (2014). Psychonomic Bulletin and Review, 22, 1546–1547. doi:10.3758/s13423-014-0760-9

  72. Pomerantz, J. R., Sager, L. C., & Stoever, R. J. (1977). Perception of wholes and their component parts: Some configural superiority effects. Journal of Experimental Psychology: Human Perception and Performance, 3, 422–435. doi:10.1037/0096-1523.3.3.422

  73. Pylyshyn, Z. W. (1999). Is vision continuous with cognition? The case of impenetrability of visual perception. Behavioral and Brain Sciences, 22, 341–423. doi:10.1017/S0140525X99002022.

  74. Ritter, M. (1980). Perception of depth: Different processing times for simple and relative positional disparity. Psychological Research, 41, 285–295. doi:10.1007/BF00308874.

  75. Rock, I. (1985). Perception and knowledge. Acta Psychologica, 59, 3–22. doi:10.1016/0001-6918(85)90039-3.

  76. Roelfsema, P. R. (2006). Cortical algorithms for perceptual grouping. Annual Review of Neuroscience, 29, 203–27. doi:10.1146/annurev.neuro.29.051605.112939.

  77. Roelfsema, P. R., & Houtkamp, R. (2011). Incremental grouping of image elements in vision. Attention, Perception and Psychophysics, 73, 2542–2572. doi:10.3758/s13414-011-0200-0.

  78. Sejnowski, T. J., & Paulsen, O. (2006). Network oscillations: Emerging computational principles. Journal of Neuroscience, 26, 1673–1676. doi:10.1523/JNEUROSCI.3737-05d.2006.

  79. Sekuler, A. B., & Palmer, S. E. (1992). Perception of partly occluded objects: A microgenetic analysis. Journal of Experimental Psychology: General, 121, 95–111. doi:10.1037/0096-3445.121.1.95.

  80. Self, M. W., Peters, J. C., Possel, J. K., Reithler, J., Goebel, R., Ris, P., Jeurissen, D., Reddy, L., Claus, S., Baayen, J. C., & Roelfsema, P. R. (2016). The effects of context and attention on spiking activity in human early visual cortex. PLoS Biology, 14, e1002420. doi:10.1371/journal.pbio.1002420.

  81. Shadlen, M. N., & Movshon, J. A. (1999). Synchrony unbound: A critical evaluation of the temporal binding hypothesis. Neuron, 24, 67–77. doi:10.1016/S0896-6273(00)80822-3.

  82. Shah, A., & Frith, U. (1983). An islet of ability in autistic children: A research note. Journal of Child Psychology and Psychiatry, 24, 613–620. doi:10.1111/j.1469-7610.1983.tb00137.x.

  83. Simmons, D. R., Robertson, A. E., McKay, L. S., Toal, E., McAleer, P., & Pollick, F. E. (2009). Vision in autism spectrum disorders. Vision Research, 49, 2705–2739. doi:10.1016/j.visres.2009.08.005.

  84. Singer, W., & Gray, C. M. (1995). Visual feature integration and the temporal correlation hypothesis. Annual Review of Neuroscience, 18, 555–586. doi:10.1146/annurev.ne.18.030195.003011.

  85. Smith, F. W., & Muckli, L. (2010). Nonstimulated early visual areas carry information about surrounding context. Proceedings of the National Academy of Sciences USA, 107, 20099–20103. doi:10.1073/pnas.1000233107.

  86. Sun, L., Grützner, C., Bölte, S., Wibral, M., Tozman, T., Schlitt, S., Poustka, F., Singer, W., Freitag, C. M., & Uhlhaas, P. J. (2012). Impaired gamma-band activity during perceptual organization in adults with autism spectrum disorders: Evidence for dysfunctional network activity in frontal-posterior cortices. Journal of Neuroscience, 32, 9563–9573. doi:10.1523/jneurosci.1073-12.2012.

  87. Suzuki, S., & Peterson, M. A. (2000). Multiplicative effects of intention on the perception of bistable apparent motion. Psychological Science, 11, 202–209. doi:10.1111/1467-9280.00242.

  88. Tallon-Baudry, C., & Bertrand, O. (1999). Oscillatory gamma activity in humans and its role in object representation. Trends in Cognitive Sciences, 3, 151–162. doi:10.1016/S1364-6613(99)01299-1.

  89. Tovee, M. J. (1994). Neural processing: How fast is the speed of thought? Current Biology, 4, 1125–1127. doi:10.1016/S0960-9822(00)00253-0.

  90. Treisman, A., & Gelade, G. (1980). A feature integration theory of attention. Cognitive Psychology, 12, 97–136. doi:10.1016/0010-0285(80)90005-5.

  91. Trujillo, L. T., Allen, J. J. B., Schnyer, D. M., & Peterson, M. A. (2010). Neurophysiological evidence for the influence of past experience on figure-ground perception. Journal of Vision, 10(2), 5, 1–21. doi:10.1167/10.2.5.

  92. Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In Ingle, D. J., Goodale, M. A., & Mansfield, R. J. W. (Eds.), Analysis of Visual Behavior (pp. 549–586). Cambridge, MA: MIT Press.

  93. van der Helm, P. A. (2000). Simplicity versus likelihood in visual perception: From surprisals to precisals. Psychological Bulletin, 126, 770–800. doi:10.1037//0033-2909.126.5.770.

  94. van der Helm, P. A. (2004). Transparallel processing by hyperstrings. Proceedings of the National Academy of Sciences USA, 101(30), 10862–10867. doi:10.1073/pnas.0403402101.

  95. van der Helm, P. A. (2012). Cognitive architecture of perceptual organization: From neurons to gnosons. Cognitive Processing, 13, 13–40. doi:10.1007/s10339-011-0425-9.

  96. van der Helm, P. A. (2014). Simplicity in vision: A multidisciplinary account of perceptual organization. Cambridge, UK: Cambridge University Press.

  97. van der Helm, P. A. (2015). Transparallel mind: Classical computing with quantum power. Artificial Intelligence Review, 44, 341–363. doi:10.1007/s10462-015-9429-7.

  98. van der Helm, P. A. (2016). A cognitive architecture account of the visual local advantage phenomenon in autism spectrum disorders. Vision Research, 126, 278–290. doi:10.1016/j.visres.2015.04.009.

  99. van der Helm, P. A., & Leeuwenberg, E. L. J. (1996). Goodness of visual regularities: A nontransformational approach. Psychological Review, 103, 429–456. doi:10.1037/0033-295X.103.3.429.

  100. van der Helm, P. A., & Treder, M. S. (2009). Detection of (anti)symmetry and (anti)repetition: Perceptual mechanisms versus cognitive strategies. Vision Research, 49, 2754–2763. doi:10.1016/j.visres.2009.08.015.

  101. van Lier, R. J., van der Helm, P. A., & Leeuwenberg, E. L. J. (1994). Integrating global and local aspects of visual occlusion. Perception, 23, 883–903. doi:10.1068/p230883.

  102. VanRullen, R., & Thorpe, S. J. (2002). Surfing a spike wave down the ventral stream. Vision Research, 42, 2593–2615. doi:10.1016/S0042-6989(02)00298-5.

  103. Vetter, P., Smith, F. W., & Muckli, L. (2014). Decoding sound and imagery content in early visual cortex. Current Biology, 24, 1256–1262. doi:10.1016/j.cub.2014.04.020.

  104. von der Malsburg, C. (1981). The correlation theory of brain function. Internal Report 81-2 Max-Planck-Institute for Biophysical Chemistry. Germany: Göttingen.

  105. von Helmholtz, H. L. F. (1962). Treatise on physiological optics (J. P. C. Southall, Trans.) New York, Dover: (Original work published 1909).

  106. von Stein, A., Chiang, C., & König, P. (2000). Top-down processing mediated by interareal synchronization. Proceedings of the National Academy of Sciences USA, 97, 14748–14753. doi:10.1073/pnas.97.26.14748.

  107. von Stein, A., & Sarnthein, J. (2000). Different frequencies for different scales of cortical integration: From local gamma to long range alpha/theta synchronization. International Journal of Psychophysiology, 38, 301–313. doi:10.1016/S0167-8760(00)00172-0.

  108. Wagemans, J., Feldman, J., Gepshtein, S., & Kimchi, R., Pomerantz, J. R. van der Helm, P. A. & van Leeuwen, C. (2012). A century of Gestalt psychology in visual perception: II. Conceptual and theoretical foundations. Psychological Bulletin, 138, 1218–1252. doi:10.1037/a0029334.

  109. Wertheimer, M. (1923). Untersuchungen zur Lehre von der Gestalt [On Gestalt theory]. Psychologische Forschung, 4, 301–350. doi:10.1007/BF00410640.

  110. Wolfe, J. M. (2007). Guided search 4.0: Current progress with a model of visual search. In Gray, W. (Ed.), Integrated models of cognitive systems (pp. 99–119). New York: Oxford University Press.

  111. Womelsdorf, T., Fries, P., Mitra, P. P., & Desimone, R. (2006). Gamma-band synchronization in visual cortex predicts speed of change detection. Nature, 439, 733–736. doi:10.1038/nature04258.

Download references


I thank Dan Coates, Sander van de Cruys, and Vebjørn Ekroll for their inspiring discussion session, in Vielsalm, on seeing versus thinking. This research was supported by Methusalem grant METH/14/02 awarded to Johan Wagemans (www.gestaltrevision.be).

Author information

Correspondence to Peter A. van der Helm.


Appendix A: How reliable is human vision?

Usually, we rely on vision to guide our actions in the world, but exactly how veridical is vision? For candidate perceptual organizations of a scene, the likelihood and simplicity principles, for instance, predict probabilities of being perceived. Hence, in order for vision to be highly veridical, perceptual organizations with higher probabilities of being perceived should, as a rule, also be the ones that are more likely to be true in the world. One cannot exclude that this is the case—for instance, through evolutionary adaptation of visual systems to the world, or equally plausible, because we adapted the world such that it matches our visual preferences (think of cities versus jungles).

The scientific assessment of the veridicality of vision, however, is problematic. Noticeably independently of vision, one would need to know (a) the structure of the world, and (b) the probabilities of things in the world. The catch now is that probabilities can be assigned only after things have been categorized (Fig. 3), and that different categorizations may imply different probabilities (Bertrand, 1889). This cannot be solved, because the structure of the world and probabilities of things in it exist only in our heads – after vision has done its work (Feldman, 2013; Hoffman, Singh, & Prakash, 2015).

Fig. 3

Probabilities depend on categorization. For two sticks thrown randomly on the floor, one probably would say intuitively that the four configurations given here decrease from left to right in probability of occurring. However, this holds only after one has classified them as belonging to categories of similar configurations. Without categorization, the four configurations would all be equally likely to occur. (After van Lier, van der Helm, & Leeuwenberg, 1994)

Accordingly, likelihood models usually employ subjective probabilities (i.e., beliefs) to fit empirical data, without veridicality claims. They often apply Bayes’ rule (Bayes & Price, 1763), which holds that the probability of a hypothesis given data equals the normalized product of (a) the prior probability of the hypothesis independently of the data, and (b) the conditional probability of the data if the hypothesis were true. In the same terms, findings in mathematics suggest that the simplicity principle’s priors are probably not veridical, but that its conditionals may well be fairly veridical in many actual or imagined worlds (for more details, see van der Helm, 2000).

The latter is relevant to moving observers, who update their percepts each time they get another view of the same scene. Visual updating can be modeled by recursive application of Bayes’ rule, and then, the conditionals become decisive. So, although one cannot assess exactly how veridical vision is, the foregoing suggests that a simplicity-based visual system is sufficiently reliable in everyday life to have survived during evolution.

Appendix B: Neuronal synchronization

Neuronal synchronization is the phenomenon that neurons, in transient assemblies, temporarily synchronize their activity. Not to be confused with neuroplasticity, which involves changes in connectivity, such assemblies are thought to arise when neurons shift their allegiance to different groups by altering connection strengths (Edelman, 1987), which may also imply a shift in their specificity and function (Gilbert, 1992). Both theoretically and empirically, neuronal synchronization has been associated with cognitive processing (Eckhorn et al., 1988; Gray & Singer, 1989; Milner, 1974; von der Malsburg, 1981), with a noteworthy distinction between synchronization in the theta, alpha, and beta bands (4–30-Hz oscillations) and synchronization in the gamma band (30–70-Hz oscillations) (Kopell, Ermentrout, Whittington, & Traub, 2000; von Stein & Sarnthein, 2000).

Synchronization in the theta, alpha, and beta bands, on the one hand, seems involved in interactions between relatively distant brain structures. For instance, it has been found to be correlated with top-down processes dealing with aspects of memory, expectancy, and task (Kahana, 2006; von Stein, Chiang, & König, 2000). Synchronization in the gamma band, on the other hand, seems involved in relatively local computations. It has been found to be correlated in particular with visual processes—such as those dealing with change detection, interocular rivalry, feature binding, Gestalt formation, and form discrimination (Fries, Roelfsema, Engel, König, & Singer, 1997; Keil, Muller, Ray, Gruber, & Elbert, 1999; Lu, Morrison, Hummel, & Holyoak, 2006; Singer & Gray, 1995; Womelsdorf, Fries, Mitra, & Desimone, 2006).

In general, neuronal synchronization can be said to reflect a flexible and efficient mechanism subserving the representation of information, the regulation of the flow of information, and the storage and retrieval of information (Sejnowski & Paulsen, 2006). Notice, however, that this characterization is about cognitive factors associated with synchronization rather than about the nature of underlying cognitive processes. In other words, it actually expresses only that synchronization is a manifestation of cognitive processing—just as the bubbles in boiling water are a manifestation of the boiling process (Bojak & Liley, 2007; Shadlen & Movshon, 1999).

Notice further that neuronal synchronization reflects more than standard parallel distributed processing (PDP). Whereas PDP typically involves interacting agents doing different things simultaneously, neuronal synchronization involves interacting agents doing the same thing simultaneously—think of flash mobs or choirs going from cacophony to harmony. Therefore, as advocated in this article, gamma synchronization might well be a manifestation of transparallel feature processing, which means that many similar features are processed in one go, that is, simultaneously as if only one feature were concerned.

A final remark seems in order. The temporal correlation hypothesis (Milner, 1974; von der Malsburg, 1981; for a review, see Gray, 1999) applies to the integration of different features into percepts. It holds that gamma synchronization binds those neurons which, together, represent one perceptual entity (see also Eckhorn et al., 2001). Much can be said for and against this idea (see, e.g., Shadlen & Movshon, 1999), but in any case, this is not the idea this article relies on. In this article, gamma synchronization is related to binding of similar features, which constitutes the basis for integration of different features into objects. This retains the idea that gamma synchronization subserves perceptual integration, but instead of taking synchronization as a force that binds features, it takes it as a manifestation of the further processing of bound features.

Appendix C: Transparallel processing by hyperstrings

The minimal coding algorithm PISA for strings employs transparallel processing by hyperstrings to hierarchically recode exponential numbers of similar features simultaneously as if only one feature were concerned. Full technical exposés on PISA can be found in van der Helm (2004, 2014, 2015); here, I first introduce minimal coding and hyperstrings, and then I illustrate how the latter enable transparallel processing.

To compute simplest codes of strings, PISA employs the mathematically grounded coding language and complexity metric from structural information theory (SIT; for overviews, see Leeuwenberg & van der Helm, 2013; van der Helm, 2014). SIT applies the simplicity principle to make quantitative predictions in visual form perception, which led to empirically successful quantitative models of amodal completion (van Lier et al., 1994) and symmetry perception (van der Helm & Leeuwenberg, 1996). One of the coding rules in SIT’s coding language is the S-rule, which captures bilateral symmetries. For instance, by the S-rule, the string ababfababbabafbaba can be encoded into S[(aba)(b)(f)(aba)(b)], whose argument (aba)(b)(f)(aba)(b) is represented in the graph in Fig. 4 by the path along vertices 1, 4, 5, 6, 9, and 10. In fact, this graph represents, in a distributed fashion, the arguments of all symmetries into which the string can be encoded.

Fig. 4

Hyperstrings. The graph represents the arguments of all symmetries into which the string ababfababbabafbaba can be encoded. The graph is a hyperstring and can therefore be hierarchically recoded as if it were a single normal string h 1 h 2 h 3 h 4 h 5 h 6 h 7 h 8 h 9, whose substrings correspond one-to-one to hypersubstrings in the graph

To assess which of these symmetries is the simplest one, their arguments have to be hierarchically recoded first. For instance, the argument (aba)(b)(f)(aba)(b) above can be hierarchically recoded into S[((aba)(b)),((f))] which gives a further reduction in complexity (a code’s complexity roughly equals the number of remaining string elements in it). The problem now was that there may be up to an exponential number of symmetries into which a string can be encoded, so that it would take a superexponential amount of work and time to recode each of their arguments separately. Provably, however, graphs like the one in Fig. 4 are hyperstrings, which are defined graph-theoretically by:

Definition A.1

A hyperstring is a simple semi-Hamiltonian directed acyclic graph (V,E) with a labeling of the edges in E such that, for all vertices i,j,p,qV:

$$\text{either}\ \pi(i,j) = \pi(p,q)\ \text{or}\ \pi(i,j) \cap \pi(p,q) = \emptyset $$

where substring set π(v 1,v 2) is the set of label strings represented by the paths between vertices v 1 and v 2; the subgraph on the vertices and edges in these paths is a hypersubstring.

Definition A.1 holds that, in a hyperstring, substring sets represented by hypersubstring are either completely identical or completely disjoint—never something in between. This implies that the hyperstring in Fig. 4 can be treated as if it were a single normal string H = h 1 h 2 h 3 h 4 h 5 h 6 h 7 h 8 h 9, whose substrings correspond one-to-one to hypersubstrings. For instance, substrings h 1 h 2 h 3 h 4 and h 6 h 7 h 8 h 9 are identical, because they both represent the substrings (a)(b)(a)(b), (a b a)(b), and (a)(b a b) in candidate symmetry arguments. In other words, this single identity relationship between substrings in string H corresponds, in one go, to three identity relationships between substrings in candidate symmetry arguments. For instance, this identity relationship in string H means that H could be encoded into the symmetry S[(h 1 h 2 h 3 h 4),(h 5)], which thus represents, in one go, three symmetries in different candidate symmetry arguments, namely:

$$\begin{array}{lll} S[((a)(b)(a)(b)),((f))]&\text{in the argument} &(a)(b)(a)(b)(f)(a)(b)(a)(b)\\ S[((aba)(b)),((f))]&\text{in the argument} &(aba)(b)(f)(aba)(b)\\ S[((a)(bab)),((f))]&\text{in the argument} &(a)(bab)(f)(a)(bab) \end{array} $$

Hence, by encoding the hyperstring, one in fact hierarchically recodes all candidate symmetry arguments in one go, without having to distinguish explicitly between them.

There is, of course, much more one has to reckon with to get a full-blown minimal coding algorithm (for that, see the full technical exposés). However, the foregoing shows that the candidate symmetry arguments do not have to be recoded in a serial fashion (i.e., one after the other by one processor) or in a parallel fashion (i.e., simultaneously by many processors). Instead, they can be recoded simultaneously by one processor (e.g., a single-processor classical computer) as if only one symmetry argument were concerned. This also holds for the other coding rules in SIT’s coding language, and this is the extraordinary form of processing I dubbed transparallel processing.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

van der Helm, P.A. Human visual perceptual organization beats thinking on speed. Atten Percept Psychophys 79, 1227–1238 (2017). https://doi.org/10.3758/s13414-017-1281-1

Download citation


  • Attention
  • Cognitive impenetrability
  • Neuronal synchronization
  • Perceptual organization
  • Seeing versus thinking