“But, unless there is something extraordinary about the conceptual capacities of pigeons, our findings show that an animal readily forms a broad and complex concept when placed in a situation that demands one”. Herrnstein and Loveland (1964, p. 551)

Birds master a sheer endless variety of perceptual categories

The critical function of any brain is to predict the consequences of actions based on sensory stimuli. Analysis of sensory input can be rather simple, for instance when consuming a standardized food item that is directly in the field of view. But often decisions involve a wealth of past experiences and a complex sensory analysis since not all stimuli that require the same action also look the same. Perceptual categorization enables animals to group stimuli based on their sensory features (see Box 1 for formal definitions). This core cognitive ability is executed almost instantaneously, seemingly without any effort, and allows assigning functional associations to items in the world around us. In fact, categorization appears at a comparable timescale as the initial detection of an object. The category membership can be reported before an idiosyncratic identification of an object is possible (Grill-Spector and Kanwisher 2005). As a result of these operations, organisms handle the endless variety of perceptual input by first recognizing the category of items to subsequently discriminate between them or generalize across different stimuli. All these different processes contribute to categorization and the formation of concepts. How categorization is mediated at a neuronal level, what stimulus features are used, and how concepts emerge from categories remain open questions. These mechanisms have previously been reviewed (Soto and Wasserman 2014) and synthesized into a mechanistic hypothesis (Güntürkün et al. 2018). In the current review, we will provide insights from the realm of birds into the behavior and the neurobiology of perceptual visual categorization by mainly focusing on key developments of recent years. Although we only review studies that used visual stimuli, there is strong evidence from experiments using human participants that categorization of visual and tactile objects generates highly similar veridical perceptual spaces to form overlapping object categorization processes (Tabrik et al. 2021). Studies in corvids also show that auditory categorization follows highly similar principles to the visual system (Wagener and Nieder 2020).

The common elements model of categorization

The common elements model of categorization (Soto and Wasserman 2010, 2012) provides a theoretical and neurobiological framework that describes how the avian visual system parcellates objects into different categories and uses these representations to guide decision making. The model rests on two assumptions. First, objects belonging to a category are represented by a combination of shared perceptual features (the elements), and these elements have different probabilities of being a diagnostic measure of a particular category. Elements that have high probability of diagnosing a particular category are shared between many, if not all different objects, making these elements category-specific. In contrast, elements that have a low probability of diagnosing a particular category are not shared by many objects comprising the category, making these elements only stimulus-specific. Second, the model assumes that connections between category-specific or stimulus-specific elements and behavioral responses are strengthened through error-driven learning, depending on their ability to predict reward. As learning is proportionate to reward-prediction error, only stimulus-specific and category-specific elements that are predictive of reward control behavioral decisions.

The common elements model is implemented as a simple hierarchical feedforward network (Riesenhuber and Poggio 2000; Serre et al. 2007), with alternating simple cell-like and complex cell-like layers as inspired by the architecture of the mammalian ventral visual stream. This pathway is a recurrent occipito-temporal network that associates early visual areas with the anterior inferior temporal cortex, and shows diverse and clustered categorical selectivity for visual objects (Kravitz et al. 2013). Thereby, layers of simple cells are interleaved with layers of complex cells, which combine the input of several units with similar selectivity but slightly different positions and scales. These non-linear operations between layers allow the network to extract increasingly specific and complex image features, mimicking the hierarchical computations known to occur along the pigeon tectofugal pathway (Li et al. 2007; Azizi et al. 2019). Some aspects of the model are not completely consistent with the physiology of the primate visual system. For instance, final layer neurons do not show invariance and sparseness comparable with inferior temporal cortex (Robinson and Rolls 2015). The model is, however, a reasonable approximation of the simple hierarchical processing operations that occur along the tectofugal pathway in the avian brain.

For classification learning, complex units across the four layers of complex cells of the common elements model project directly to a reinforcement learning stage. The reinforcement learning stage replicates the function of the dopaminergic system, which computes reward-prediction error in conjunction with the prefrontal cortex (PFC) in mammals (Starkweather et al. 2018) and the functionally analogous structure, nidopallium caudolaterale (NCL) in birds (Packheiser et al. 2021). These operations are mediated by dopamine projections, which is a key stage enabling the organism to select the appropriate category signal emanating from the PFC/NCL to make an appropriate motor response (Antzoulatos and Miller 2011; Puig and Miller 2012; Schultz 2016). Reward-driven feedback also allows PFC/NCL to shape the responses of neurons in the visual cortex (Sasikumar et al. 2018). Soto and Wasserman (2012) revealed that the common elements model captures most of pigeons’ behavioral performance in categorization tasks (e.g., size transformation, view interpolation, and surface feature removal). Interestingly, the model more closely approximated pigeon than human behavior in several of the experimental designs tested, aligning with the evidence that pigeons show substantially less capacity to tolerate transformations across viewpoint, size, location etc. (Soto and Wasserman 2014). These findings suggest that some components of the primate visuo-spatial system, PFC and extended memory systems that enable higher-order object recognition abilities (e.g., “mental rotation” and view interpolation) do not have equivalents in the pigeon brain. As we will discuss when we turn our attention to the avian visual cortex, these findings align with the neurophysiological data suggesting that the pigeon visual system represents object features at an intermediate stage of complexity relative to primates (Clark and Colombo 2022; Clark et al. 2022a). We here use the term “avian visual cortex” to label the isocortex-like pallial components of the visual thalamofugal and tectofugal pathways (Stacho et al. 2020). This will be outlined below.

A short overview of the avian visual pathways

In 1943, the French ophthalmologist André Rochon-Duvigneaud coined pigeons as nothing else but two eyes with wings. We humans are highly visual primates and view our surroundings with the information transmitted by about 1 million axons within each of our optic nerves. In pigeons with their 2.5 g brain, this number stands at 2.3 million (Binggeli and Paule 1969). Pigeons also surpass humans in their ability to discriminate luminance (Hodos et al. 1985), and discern subtle color differences (Emmerton and Delius 1980). Birds have exceptionally large eyes for their body size and their cerebrum is enlarged by at least a factor of 10 relative to similarly sized fish and reptiles (Shimizu et al. 2017). Figure 1A exemplifies these findings for pigeons. To facilitate the fine-grained analysis of objects features, the avian retina is equipped with two specialized regions of high cone and ganglion cell densities to enhance spatial and temporal resolution (Bringmann et al. 2018). These two areas have different projections to neural structures and enable distinct analyses of the visual input (Remy and Güntürkün 1991; Güntürkün and Hahmann 1999; Clark and Colombo 2022). The avian tectofugal pathway—homologous to the mammalian extrageniculocortical pathway—is mainly responsible for both object and motion vision in the frontal visual field: As depicted in Fig. 1B, visual information travels from the eye to the midbrain optic tectum and thence to the nucleus rotundus in the thalamus. From here, the information flow enters the entopallium, one of the two primary visual areas of the telencephalon, and is further relayed to multiple higher visual associative forebrain areas. The thalamofugal pathway is homologous to the mammalian geniculocortical pathway and processes visual information from the lateral field of view. The visual information from the retina travels via the nucleus geniculatus lateralis, pars dorsalis (GLd) in the thalamus to the visual Wulst in the telencephalon (Clark and Colombo 2020). These visual pathways divide and process information in a spatially parallel manner (Nguyen et al. 2004), utilizing a cellular architecture constituted by columnar local connections and horizontal layers in hyperpallium and DVR, that resembles the mammalian cortex in terms of its anatomical (Fig. 1C; Stacho et al. 2020) and its network structure (Fig. 1D).

Fig. 1
figure 1

Adapted from Güntürkün et al. (2021). D The connectome of the pigeon cerebrum in sagittal view. Nodes are color-coded according to module membership. Based on the study by Shanahan et al. (2013). A arcopallium; AD: arcopallium dorsale, AI arcopallium intermedium, APH area parahippocampalis, Bas N. basalis prosencephalic, CDL area corticoidea dorsolateralis, E entopallium, GLd nucleus geniculatus lateralis, pars dorsalis, HA hyperpallium apicale, HD hyperpallium dorsale, HF hippocampal formation, HI hyperpallium intercalatum, IHA N. interstitialis hyperpallii apicalis, L1-3 field L1-3, MD mesopallium dorsale, MV mesopallium ventral, NCL Nidopallium caudolaterale, NI Nidopallium intermedium, Rt nucleus rotundus

Schematic anatomical arrangement of the pigeon brain, the visual pathways and the forebrain connectome. A Overlap of MRI pigeon brain within the CT head data and the pigeon’s brain position. As transparent structures, the eye and the embedding skull are visible. Please note the large eye and the enlarged cerebrum. B Sagittal depiction of the visual pathways in the pigeon brain. Birds have two main visual pathways known as the tectofugal and thalamofugal systems that correspond to the mammalian extrageniculocortical and geniculocortical pathways, respectively. C Sagittal view of the pigeon forebrain with a highly schematized depiction of the avian sensory cortex. 

Among other vertebrates, only mammals display a comparably enlarged cerebrum like birds, with primates possessing exceptionally high neuron densities like corvids and parrots (Kverková et al. 2022). Many mammals (particularly primates) also possess large eyes for their body size (Ross and Kirk 2007), developed fovea (Provis et al. 1998), and greatly expanded visual processing networks in the telencephalon (Kaas et al. 2022; van Essen et al. 1992). The similarities between birds and primates means that understanding the physiology of the avian visual system represents a unique opportunity to compare how similar principles of perception, motor control and planning are implemented by neuronal hardware that differs from the mammalian cortex.

The thalamofugal pathway

The study of Stacho et al. (2020) demonstrated that the entire sensory pallium of birds encompassing both the components of the visual thalamofugal and the visual tectofugal systems is characterized by columnar canonical iterative circuits that are highly similar in both the thalamofugal and the tectofugal regions. Thus, these circuits are mostly identical throughout sensory systems and pallial areas (canonical) and they are repeated in identical way throughout the expanse of the sensory pallium. In addition, canonical circuits of both thalamo- and tectofugal systems are tangentially intersected by long-range associative axons that cross-connect all columns and link them to prefrontal, hippocampal, and (pre)motor structures (Fig. 1C). This cortical organization is only visible in the sensory pallium, while associative and motor areas have a different organization. The thalamofugal visual system terminates in the cortex-like territory of the Wulst (German for bulge or swelling), a laminated structure at the dorsal roof of the avian telencephalon which contains both a somatosensory and a visual processing region (Bischof et al. 2016; Pettigrew and Konishi 1976a, b; Wild 1987). This visual component of the Wulst receives projections from the dorsolateral geniculate nucleus (GLd) and constitutes together with the GLd the thalamofugal visual pathway (Güntürkün and Karten 1991). The Wulst is functionally analogous with the primary visual cortex (V1) in many respects, such as displaying detailed retinotopic maps of the visual space, selectivity to orientation/direction of motion, and small receptive field sizes (Bischof et al. 2016; Gusel'nikov et al. 1977; Revzin 1969). In predatory birds with frontally oriented eyes, such as owls, the cortex-like architecture of the Wulst is expanded which may be related to their behavioral specializations. In these birds, the Wulst plays an important role in computing binocular disparity (Nieder and Wagner 2001; Pettigrew and Konishi 1976a, b; Wagner and Frost 1993), and performs global shape analysis that goes beyond that performed by the primary visual cortex (V1; Nieder and Wagner 1999). The owl Wulst also displays clustered pinwheel arrangements of neurons sensitive to orientation, like the monkey and cat extrastriate cortex (Liu and Pettigrew 2003). Laterally eyed birds, such as pigeons, possess a much less differentiated Wulst lamination (Stacho et al. 2020), and no clustered orientation arrangements of pinwheels (Ng et al. 2010). The thalamofugal pathway in laterally eyed birds relates more to the processing of distant stimuli viewed in the monocular visual field (Budzynski et al. 2002; Budzynski and Bingman 2004) and spatial localization (Bischof et al. 2016; Watanabe et al. 2011).

The tectofugal pathway

The tectofugal pathway plays the dominant role in detailed pattern vision in laterally eyed birds. This is particularly true when stimuli are viewed nearby in the frontal binocular visual field, as is mainly encountered in an operant chamber (Güntürkün and Hahmann 1999; Remy and Güntürkün 1991). The differentiated network of 15 layers comprising the avian optic tectum highlights the tectofugal pathways importance in both spatial attention (Marín et al. 2005) and stimulus perception (Neuenschwander et al. 1996; Neuenschwander and Varela 1993). The optic tectum displays a detailed retinotopic map of the visual field, and a progressive increase in the complexity of response properties and receptive field sizes at increasing depths (Frost and DiFranco 1976; Luksch 2003). Layer 13 of the optic tectum projects to the thalamic nucleus rotundus, by transforming the tectal retinotopy to a rotundal functionotopy for form, color, 2D motion and looming (Laverghetta and Shimizu 1999; Wang et al. 1993; Hellmann and Güntürkün 2001). These modules project topographically to the pallium that is composed of an inner region called the nidopallium, and a more dorsal region called the mesopallium. The nidopallium contains the main projection zone of the tectofugal pathway, which is known as the entopallium (Husband and Shimizu 1999) and also displays functional specializations for form/color and motion information along its anterior–posterior extent (Cook et al. 2013; Nguyen et al. 2004), and large receptive fields (Gu et al. 2002). The entopallium displays a topographic arrangement of cortex-like fiber connections oriented roughly perpendicular with the overlying intercalated nidopallium (NI), and mesopallium ventrolaterale (MVL) layers (Krützfeldt and Wild 2005; Stacho et al. 2020). These layers might be analogous with the mammalian extrastriate cortex (Butler et al. 2011; Karten 1969) and play a critical role in the categorization of complex visual stimuli.

In the following section, we will focus on the operation of the tectofugal projections in the telencephalon, as it is the best-understood cortex-like component of the visual system in birds in terms of its neurophysiology. These bottom-up visual computations form the basis of object, category, and abstract rule processing in birds, which in many tasks are executed at levels comparable to primates (Scarf et al. 2016; Veit and Nieder 2013).

The avian visual cortex—perceptual categorization

Recent investigation of the physiology of the avian sensory cortex has revealed that hierarchical information processing builds increasingly complex and abstract representations of visual stimuli in the pigeon brain. These mainly feedforward shaped computations are very similar to the transformation of information observed across the mammalian ventral visual stream (Riesenhuber and Poggio 2000; Vinken et al. 2016). Arrays of neurons at higher stages of the processing hierarchy in mammals (such as primate inferior temporal cortex) are both selective to complex shapes and relatively invariant to non-linear changes, such as lighting, distance, viewpoint, and spatial translation (Bao et al. 2020; Freiwald and Tsao 2010; Gross and Schonen 1992; Wallis and Rolls 1997).

The entopallium is the first stage of hierarchical processing within the cortex-like architecture of the avian telencephalon that receives thalamic input and forwards information to the overlying MVL and NI layers to extract more complex features (Fig. 2A; Stacho et al. 2020; Clark and Colombo 2020). Consistent with entopallium reflecting a relatively early stage of categorization, neurons are selective for parameters such stimulus size and direction/speed of motion (Engelage and Bischof 1996; Gu et al. 2002), but the population responses do not distinguish well between images belonging to different stimulus categories (Fig. 2B; Azizi et al. 2019; Clark et al. 2022a). These features suggest that entopallium may reflect an intermediate stage of processing in the common elements model hierarchy of simple and complex unit layers (Serre et al. 2007; Soto and Wasserman 2012) that has not built sufficient receptive field invariance to discriminate between different object categories. Figure 2B illustrates these hypothetical feature selection operations within the visual cortex of pigeons.

Fig. 2
figure 2

Hypothetical wiring pattern of the avian visual cortex and its proposed function on feature extraction and processing. A Hypothetical hierarchical visual information flow within the visual tectofugal pallium. B Depiction of the hypothesized hierarchical feature selection operations at different levels of the visual DVR. At the level of the entopallium, a basic feature selection operation is performed on the visual input (here depicted by a “digital embryo”, Pusch et al. 2022). In our example, this operation is represented by the detection of edges that roughly correspond to the depicted orientations. However, at the stage of the entopallium the processed visual information is not sufficient to signal information about an object category. At the next hierarchical level, information from several Entopallial neurons converges in MVL neurons and is integrated. The resulting population code at the level of MVL conveys information about an object category viewed by the pigeon. After a further computational step, the visual information of the MVL cells converges at the level of the NI and is transferred to higher associative areas

Azizi et al. (2019) demonstrated that the population response of the overlying MVL layer distinguished between the features of images depicting animate and inanimate stimuli with far greater accuracy than at the level of the entopallium in a task that required the birds to peck the images for food reward without categorizing the stimuli. The visual features that the MVL population used to achieve categorization of the objects was also quite dissimilar from a simple V1-like model of Gabor filters, suggesting that MVL neurons represent more abstract features of stimuli than edges in particular orientations. Clark et al. (2022a) used a different image set and found that the population responses in MVL distinguished between the features of faces and scrambled faces with greater fidelity than the entopallium in a response-inhibition task that also did not require categorization (see Box 2 for further details). Interestingly, many MVL neurons respond strongly to scrambled images (Clark et al. 2022a, b) much like neurons in mammalian V1 (Vinken et al. 2016), suggesting that local edges are processed alongside some more abstract stimulus features at higher stages of processing within the cortex-like layers (cf. Fig. 2A, B). A preference for intact objects over scrambled images emerges at the level of NI (Clark et al. 2022b), suggesting that NI neurons sum the inputs of local orientation detectors at lower stages of processing to form receptive field filters that detect coarse low spatial frequency or complex shape features over a large area. The output of the NI layer is well situated to forward highly integrated visual information to the executive centers and memory systems of the avian brain (cf. Fig. 2A, B).

The avian ‘prefrontal area’

A global analysis of the architecture of the avian forebrain revealed a network organization remarkably similar to the mammalian connectome (Fig. 1D; Shanahan et al. 2013). In both group of animals, distinct local networks are dedicated to different sensory modalities, motor, limbic, and executive processes. These local networks are connected through central hubs, one of which is the prefrontal cortex. In birds, this corresponds to the nidopallium caudolaterale (NCL). This executive hub takes on a central position with afferent and efferent projections to all associative, sensory limbic and premotor structures. While the NCL does not share the cortical columnar circuitry with the cortex (Stacho et al. 2020), several lines of evidence indicate that it is indeed the avian functional counterpart of the mammalian prefrontal cortex (Güntürkün et al. 2021). The NCL is usually identified as the part of the pallium with the richest dopaminergic innervation (Güntürkün 2005; von Eugen et al. 2020). A part of these dopaminergic terminals form ‘baskets’ as dense encapsulations of individual perikarya that enable a very specific targeting of individual neurons (Waldmann and Güntürkün 1993; Wynne and Güntürkün 1995). It is possible that this mode of innervation might have a similar functional role in the unlaminated cluster of the avian NCL as layer-specific projections in the mammalian PFC. At the functional level, the similarity to PFC was initially established with various lesion and inactivation studies that reliably demonstrated that NCL is involved in higher, more abstract processes such as the processing of behavioral rules (Güntürkün 1997a; Hartmann and Güntürkün 1998; Mogensen and Divac 1982; Diekamp et al. 2002a, b). These reports were confirmed in many neurophysiological studies that involved the NCL in many of the typical prefrontal functions (Güntürkün et al. 2021). To name a few examples, neural correlates of categorization (Kirsch et al. 2009; Ditz et al. 2022), working memory (Diekamp et al. 2002a, b; Hahn et al. 2021; Veit et al. 2014), executive control (Rose and Colombo 2005), reward processing (Koenen et al. 2013; Packheiser et al. 2021), numerosity (Wagener et al. 2018), rules (Veit and Nieder 2013), and even sensory consciousness as the ability to be aware of a sensory event (Nieder et al. 2020) have been discovered in NCL. Also, the neural ‘code’ found in the NCL largely follows the same principles as neural representations in the PFC. In working memory, neurons (‘delay cells’) show evidence of active maintenance (Diekamp et al. 2002a, b), capacity limitations can be accounted for by divisive normalization and neural oscillations are in line with modern bursting models of delay activity. In both the PFC and the NCL, the neurons are tuned in a highly flexible, task-specific way (Rigotti et al. 2013). This ‘mixed selectivity’ enhances robustness and flexibility as well as the ability to represent highly abstract information.

The avian ‘prefrontal area’—perceptual categorization

We can think of categorization as a process that can occur at different levels of abstraction from physical stimulus properties (see Box 1 for formal definitions). The mammalian PFC and, correspondingly the avian NCL, are critical if abstraction increases. The location of the NCL within the avian pallial network allows the full integration of highly processed stimulus information from all modalities and the integration with limbic and, importantly, reward information. Unsurprisingly, neurons in PFC show categorical responses, that is they give a binary response even to a physically continuous stimulus. For instance, in a seminal experiment, Freedman et al. (2001) trained monkeys to categorize between renderings of cats and dogs. The stimulus set consisted of gradual morphs between cats and dogs, such that the stimuli were physically continuous. While neurons in inferior temporal cortex strongly responded to the physical ‘catness’ or ‘dogness’ of individual stimuli, prefrontal neurons gave a binary response as either cat or dog. In other words, prefrontal neurons did not represent the graded physical properties of the stimuli but only their category membership. The PFC is also able to flexibly respond to different categories. Interestingly, if the same stimulus set is categorized along different borders, then different groups of neurons represented the two categories (Roy et al. 2010). But if the animals flexibly switch between categorization involving different sets of stimuli, then the category representations were overlapping in the same neural population (Cromer et al. 2010). This highlights the importance of the PFC not only in rule-based categorization processes but also shows that conflicting, physically ambiguous categories require higher prefrontal involvement.

It is very likely that the category-selective response properties of NCL neurons are sculpted by reward and reward-driven dynamics of the strong dopaminergic input (von Eugen et al. 2020; Wynne and Güntürkün 1995) that activates local D1-receptors (Durstewitz et al. 1998). Their activation promotes synaptic stimulus–response associations (Herold et al. 2012) and signal the presence of predicted reward (Packheiser et al. 2021). In contrast, blocking of D1-receptors level the differential learning effects of unequal reward magnitudes (Rose et al. 2009, 2013; Diekamp et al. 2000). By the sum of these dopamine-mediated feedbacks, synaptic weights within cellular assemblies of NCL are increased and make it likely that the animal will increasingly select the rewarded stimulus category (Güntürkün et al. 2018; Soto and Wasserman 2010).

The contributions of the asymmetric avian brain

Avian visual pathways reveal task-specific and complementary hemispheric asymmetries in chicken hatchlings, adult pigeons and many more avian species (Güntürkün et al. 2020a, b). In both chicks and pigeons, the left hemisphere excels in visual discrimination of various object features like patterns or color (Güntürkün 1985; Rogers et al. 2007; Skiba et al. 2002), while the right hemisphere is superior in object configuration (Yamazaki et al. 2007), social cognition (Deng and Rogers 2002a; Nagy et al. 2010; Rugani et al. 2015) and spatial attention (Chiandetti 2011; Diekamp et al. 2005; Letzner et al. 2017). These asymmetries pay dividends, since birds with pronounced behavioral asymmetries fare better in foraging tasks (Güntürkün et al. 2000; Rogers et al. 2004). When tested in the context of learning the category “human vs. non-human”, Yamazaki et al. (2007) demonstrated that both hemispheres approach this challenge with complementary contributions. While the left side of the brain exploited the diagnostic value of tiny visual features, the right hemisphere concentrated on the overall configuration of the sought category. Indeed, Manns et al. (2021) could show in an elegant study that both hemispheres can take the lead during categorization, possibly based on the perceptual strategy used.

When testing pigeons in conditioning chambers, they use their frontal visual field when categorizing stimuli. The stimuli are then perceived with the dorsotemporal retina which is mainly represented in the tectofugal system (cf. Fig. 1B; Güntürkün and Hahmann 1999; Remy and Güntürkün 1991) that has a bias for local processing of object features (Clark and Colombo 2022). In contrast, the thalamofugal pathway seems to participate in global processing of more distant objects in the surrounding of pigeons (Clark and Colombo 2022). Therefore, under ecological circumstances, both hemispheres likely complement each other during categorization when using the entire visual field. Since the neurobiological studies discussed below mostly derive from experiments conducted in conditioning chambers, they possibly primarily uncover the neural fundaments of a left-lateralized superiority of visual feature coding in the context of perceptual categorizations.

Structural and physiological asymmetries of the avian visual system were investigated in both chicken (Adret and Rogers 1989; Costalunga et al. 2022; Deng and Rogers 2002b; Rogers and Sink 1988) and pigeons (Güntürkün et al. 1998; Manns and Ströckens 2014; Ströckens et al. 2013). The emergence of such asymmetries require, at least in part, an asymmetrical epigenetic event during early development. Birds take an asymmetrical position in the egg such the left eye of the avian embryo is covered by its own body, while the right eye points to the eggshell. Every time the breeding adults stand up, light falls onto the eggs, traverses the eggshell and primarily stimulates the right eye (Buschmann et al. 2006). This is the starting point for the right eye/left hemispheric superiority in visual object discrimination in birds (Manns 2021). Obstructing visual input to the right eye by a patch before (Rogers and Sink 1988) or after hatch (Manns and Güntürkün 1999) reverses both behavioral and anatomical asymmetries. While chicken predominantly evince asymmetries in the thalamofugal pathway, pigeons mainly show asymmetries in the tectofugal system (Güntürkün et al. 2020a; b). In the following, we will focus on the situation in pigeons.

Within the tectofugal pathway, already the first central structures show morphological and neurochemical asymmetries, indicating that bottom-up signals are processed in a lateralized manner (Güntürkün 1997b; Manns and Güntürkün 1999, 2003). In addition, contralaterally projecting tectal fibers are more numerous from the right tectum to the left rotundus than vice versa (Letzner et al. 2020; Fig. 3A, label A). Figure 3 summarizes the different asymmetrical processing steps and highlights their anatomical underpinnings using different labels (encircled letters A-D). These labels link the respective processing steps mentioned in the text and the figure.

Fig. 3
figure 3

Modified from Xiao and Güntürkün (2022)

Anatomical depiction of the asymmetrically organized visual system of pigeons. Frontal view of the ascending tectofugal visual system along with the anterior commissure, the intrapallial projection of the tectofugal nidopallium intermediale (NI) onto the nidopallium caudolaterale (NCL) (C) and the projection of the NCL onto the motor arcopallium (D). The NCL receives input from the tectofugal system via the NI and feedback projections from the arcopallium (C). In addition, dopaminergic (DA) brainstem areas (shown in brown) modulate NCL and arcopallium activity patterns based on learning experiences. Note that the projection of the right tectum to the left rotundus entails more fibers (A), resulting in a more bilateral representation in the left rotundus and entopallium (B).

This arrangement creates a more complete bilateral representation in the left rotundus that is then subsequently transferred to the left entopallium (Fig. 3, label B; Güntürkün et al. 1998; Letzner et al. 2020). This tectofugal asymmetry of visual representation could meanwhile be verified with behavioral (Güntürkün and Hahmann 1999; Valencia-Alfonso et al. 2009) as well as electrophysiological techniques at thalamic (Folta et al. 2004, 2007) and telencephalic levels (Verhaal et al. 2012; Xiao and Güntürkün 2022). However, it is important to keep in mind that the tectofugal visual system is not only a feedforward pathway but also includes feedback loops. For example, rotundal neurons also receive top-down pallial information that is relayed via the optic tectum. Folta et al. (2004) and Freund et al. (2016) could reveal that left rotundal neurons were strongly modulated by top-down input from the visual Wulst, while those in the right rotundus were hardly modified by descending signals. This implies that mainly left-sided thalamic neurons receive feedback from higher visual areas such as the Wulst. This finding has two implications: first, it shows that thalamo- and tectofugal pathways are not only parallel but also highly interconnected systems—an aspect that is often overlooked. Second, such a top-down asymmetry could modify left hemispheric thalamic neurons by experience-based telencephalic input, in order to selectively increase the activity level of those thalamic neurons that process category-relevant visual stimuli. Indeed, lateralized cortical top-down signals in human subjects modify activity patterns of downstream areas during categorization (Coutanche and Thompson-Schill 2015). This asymmetry of top-down control is altered by learning diagnostic stimulus features which then are pre-activated in lower sensory areas (Sigala and Logothetis 2002; Ullman 2007). At the cellular level, the results in pigeons reveal that similar processes could also occur already at thalamic level and may modify hemispheric left–right differences of stimulus categorization.

Avian categorization at neural level—a mechanistic summary

Based on these results, we will now outline a hypothesis on the neuronal processes during visual feature discrimination in birds (Fig. 4). This hypothesis also incorporates a proposal on how the differently specialized hemispheres could switch between modes of interhemispheric competition and hemispheric cooperation.

Fig. 4
figure 4

Hypothetical depiction on how visual perceptual categorization is realized in the asymmetrically organized visual system of pigeons. The categorization of humans (S+) vs. cars (S−) is used as an example. A Depiction of two neurons in the left and two in the right mesopallium ventrolaterale (MVL). Left MVL neurons respond to small diagnostic facial features while right cells are activated by stimulus configurations that indicate the presence of a human face. B The tectofugal input inhibits “car”-coding NCL neurons via inhibitory interneurons and activates cells that code for the category “human”. Dopaminergic input signals the presence or absence of reward and can thus increase synaptic weights between assemblies consisting of tectofugal, NCL, and arcopallium neurons that are all activated when the pigeon pecks onto a stimulus that belongs to the category “human”. Circuitry according to Ditz et al. (2022). C The commissura anterior connects the arcopallia of both hemispheres. When, based on diagnostic features, the left hemisphere detects a stimulus of the “human”-category, it can delay the latency of the action potentials of competing right arcopallial cells, such that their motor output is activated too late to control the choice of the animal

As outlined above, both hemispheres make different contributions for the visual analysis of various stimuli. If a bird has to categorize, say, humans from cars, left MVL cells will very likely exploit small diagnostic facial details like eyes, nose, and mouth to categorize pictures of humans at the population level (Azizi et al. 2019; Koenen et al. 2016). In contrast, right MVL neurons will mainly respond to the configurations of the body and the face of humans (Fig. 4A).

Ditz et al (2022) developed a data-driven model on the dynamics of NCL microcircuits of crows that worked on a demanding numerical categorization task. According to their results, the appearance of a stimulus (say, a human) would rapidly activate putative inhibitory interneurons that are broadly tuned to other categories than “human” and thus can exert a widespread and fast inhibitory feedforward effect on a large number of diversely tuned NCL projection neurons. As a result, network activity to, say, cars and other objects are dampened. After a short delay, putative projection neurons are activated that respond to the appearance of a human stimulus. In contrast to interneurons, these cells are narrowly tuned and only selectively respond to the sought stimulus class, while inhibitory interneurons in the vicinity of “human”-coding NCL neurons are inhibited. By such an arrangement, only cells that respond to the correct category remain active and control the response of the animal (Ditz et al. 2022) (Fig. 4B). The results of Ditz et al. (2022) in crows nicely overlap with those from pigeons. Like in crows, single unit recording studies in pigeon NCL and (pre)motor arcopallium reveal, that the inhibitory effect of the non-rewarded stimulus is faster and less precisely tuned than the excitatory effect of the rewarded one (Xiao and Güntürkün 2018; 2022). Thus, network dynamics are similar in crows and pigeons.

Data from pigeons outline how asymmetries of categorization are constituted. Both in NCL and in arcopallium, the speed of stimulus encoding during stimulus discriminations did not differ between left and right hemispheres. In contrast, the cellular timing of action generation was faster in the left hemisphere since the majority of left hemispheric neurons reached their maximal spiking frequency just before response execution, while those of right hemispheric cells were slow and came too late to control the response of the animal. Thus, left hemispheric neurons dominated the birds’ behavior not by a higher categorization ability, but by their speed in monopolizing the execution of the decision. This critical left–right difference was realized by differences of left–right interactions via the commissura anterior that connects the arcopallia of both hemispheres (Letzner et al. 2016). Xiao and Güntürkün (2018) showed that the left arcopallium delayed the peak activity time of contralateral right arcopallial neurons (Fig. 4C). As a result, the output of right hemisphere cells often came too late to control the choice of the animal. Thus, interhemispheric interactions in birds do not simply activate or inhibit the other hemisphere, but accelerate or decelerate cellular response speed in the other hemisphere, thereby establishing unilateral control on the animals’ decision (Xiao and Güntürkün 2018).

From categories to concepts

It makes sense to distinguish categories from concepts, although this distinction is not sharp but of a transitional nature. In our view, a category is defined by overlapping perceptual features. These constitute the core of the common elements theory when applied to categories. In contrast, concepts are constituted by groups of stimuli that do not all share these perceptual features. Still, humans and some other animals might conceive them as a common group.

The emergence of categories and concepts has been recently investigated in a modeling-study using a deep neural network (Henningsen-Schomers and Pulvermüller 2021). Here, visual features that are present in all stimuli of the sought category (e.g., shared visual features of pigeon breeds, Fig. 5 left) create common elements of this category (overlapping dots shared by all stimuli). In addition, some elements are only shared by a subgroup of stimuli. The situation is different for abstract concepts. Their features were never shared across all members belonging to this concept, but only between subgroups of stimuli. Thus, as visible in Fig. 5 (right panel), the central zone of the concept is empty, while the overlapping zones between neighboring stimuli contain shared elements. This arrangement results in an intermediate state of feature overlap called family resemblance.

Fig. 5
figure 5

Schematic depiction of a hypothesis on how categories and concepts emerge. The left panel exemplifies the category “pigeon”. Each individual category member is characterized by a set of idiosyncratic features. These unique stimulus elements render the different pigeons identifiable. Further, some individuals might share visual aspects leading to subgroup features. However, the defining component of the category “pigeon” are overall shared features—visual aspects that are common to all category members. These shared features lead to a robust representation of a category based on visual similarity. The right panel depicts the situation for the concept “animal”, a concept that pigeons learn with some additional training (Roberts and Mazmanian 1988). As for categories, unique features characterize each individual instance of the concept. Further, several instances might have visual subgroup features. However, no features are shared by all instances of the category. As a result, no combined representation based on shared visual similarity is possible, but family resemblance emerges based on the presence of multiple subgroup features across all stimuli

After training the network with instances of such category members, the emerged cell assemblies were investigated. As a result, the authors found that stimuli belonging to a perceptual category (left side of Fig. 5) were represented in cell assemblies that showed category defining features in the neural network’s central connector hub area. This result is due to the effect that units coding for shared features are activated most frequently, leading to a relative suppression of the neurons responsive for unique features. If the common core is sufficiently activated, the categorical cell assembly will ignite as a whole, resulting in a strong persistence throughout task execution. In parallel, representation of unique features or subgroup features that are shared by only a few members of a category might pale, which results in an overshadowing of these features.

This is different for concepts. Here, no joint and shared features exist. In contrast, a larger number of neurons code for elements that are shared by only subgroups of the concept. Exactly the sum of all of these subgroup features could represented concepts. The stronger reliance on subgroup feature neurons in case of concepts creates the “family resemblance” and contextual dependency. Indeed, in humans abstract concepts do rely much more on its contextual embedding than perceptual categories (Schwanenflugel et al. 1988; 1992).

These modeling results (Henningsen-Schomers and Pulvermüller 2021) fit well with behavioral data showing that perceptual categories that share many stimulus details are easier to learn and categorize than more abstract categories at the superordinate level (see Box 1 for formal definitions; e.g., Lazareva 2004). Further, pigeons have severe problems mastering choice tasks using polymorphous concepts, i.e., stimuli that are defined as category members if they contain m-out-of-n stimulus features (Lea et al. 2006; von Fersen and Lea 1990). One explanation of this behavioral finding might be that these concepts need more behavioral training due to their neurocomputational demands to learn a group structure that lacks a central connector hub with common elements (Henningsen-Schomers and Pulvermüller 2021). If learning a concept requires the acquisition of a large number of context-dependent subgroups of features that jointly create a concept, it is easy to see, that animals with more pallial neurons can be ranked according to the speed of learning of a concept (Wright et al. 2003; Güntürkün et al. 2017). This might also explain why crows with their much larger number of associative pallial neurons are able to master these kind of tasks with ease, while pigeons face a hard time (Veit and Nieder 2013; Ströckens et al. 2022). In conclusion, sufficient training and computational power in associative brain structure might enable abstract concepts to evolve in various animal species.

The forthcoming frontiers

The synopsis of recent findings from anatomical studies, behavioral experiments, electrophysiological recordings and modeling attempts allow the formulation of a coherent theory of perceptual categorization and concept formation. Now, these theoretical implications need to be experimentally verified. In parallel, several methodological aspects might be worth to consider in future experiments on perceptual categorization and concept formation.

At the behavioral level, several algorithms to generate stimuli were introduced, which are geared to probe critical features used by the animals to facilitate categorization (Apostel and Rose 2021; Hegdé et al. 2008; Pusch et al. 2022). These stimuli represent artificial yet naturalistic objects that are free of human semantics but based on features for class distinction that can be tracked by the experimenter. Taking this approach a step further, genetic algorithms are used to adaptively change stimulus features during one experimental session, for instance to find optimal stimulus parameters for the animals (Qadri and Cook 2021). Such stimuli also allow a near perfect control over the statistics of the stimuli that define a category and might help to uncover the aspects, elements and features that guide the choice behavior of the animals.

The level of analysis might also benefit from the inclusion of additional behavioral parameters. One approach used in recent experiments is peck-tracking. Similar to human eye tracking, the peck location of the pigeons signaling their choice can be used as a proxy for measuring the pigeon’s visual attention. Indeed, it has been shown that pigeons, when learning to categorize visual stimuli, allocate their attention to the predictive features of the stimuli reflected by an increased pecking rate onto these stimulus aspects (Castro et al. 2021; Castro and Wasserman 2017; Dittrich et al. 2010; Pusch et al. 2022). In combination with the aforementioned stimulus material, this information might further aid the understanding of which stimulus features gain control over the elicited behavior.

This principle can be extended far beyond peck-tracking. Modern video analysis, such markerless pose estimation, allows tracking of behavioral aspects that were previously difficult to systematically incorporate in a detailed analysis (for example using DeepLab Cut: Nath et al. 2019; Wittek et al. 2022). All these approaches reduce experimenter biases and can reveal details not obviously visible in aggregated data to achieve an ecological valid and unbiased behavioral analysis (Anderson and Perona 2014).

On the neurophysiological level, the analysis of the supposed neural computations within the sensory aspects of the dorsal ventricular ridge (DVR)—a large pallial collection of nuclei that bulge below the lateral ventricle—and their connections with the NCL constitute core future questions. But these questions extend beyond the areas that were covered in this review and should incorporate key areas such as striatum and hippocampus. Both structures very likely constitute key contributors to categorization learning. Recent approaches like visual discrimination learning in awake and actively working pigeons tested with in ultrahigh magnetic field imaging systems, could aid these analyses, by visualizing with high resolution all cerebral areas that participate in certain task components (Behroozi et al. 2020). This further highlights the fact that categorization—like all cognition—cannot be understood at the level of individual neural structures but it must be seen as a network-process. The use of high-density methods such as electrophysiological recordings with silicone-probes, can allow parallel data-collection from the entire stacked avian visual cortex or even bilaterally from the visual and prefrontal structures simultaneously. The data that is generated with these approaches allows for the analysis of the temporal dynamics and population-level processes within and between the different nodes of the network. These critical tests might allow to further discern the network-level processes that underlie categorization and concept formation. Methods such as optogenetic stimulation and inhibition (Deisseroth 2011; Rook et al. 2021) further complement this approach by allowing causal interventions targeting for example top-down processes in perceptual categorization.

Last but not least, the differences in concept learning between pigeons and crows exemplifies important species differences within the avian class. These differences should be turned into important heuristic opportunities that enables us to see how ecological embedding and neural specialization affect the different components of avian cognition. This is only possible with a larger number of avian species that are tested.

Taken together, theoretical implications as well as methodical and conceptual advancements provide the opportunity for future experiments that will broaden our understanding of perceptual categorization in birds.