Rapid and coarse face detection: With a lack of evidence for a nasal-temporal asymmetry

Cabral, Laura; Stojanoski, Bobby; Cusack, Rhodri

doi:10.3758/s13414-019-01877-3

Rapid and coarse face detection: With a lack of evidence for a nasal-temporal asymmetry

Open access
Published: 06 January 2020

Volume 82, pages 1883–1895, (2020)
Cite this article

Download PDF

You have full access to this open access article

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Rapid and coarse face detection: With a lack of evidence for a nasal-temporal asymmetry

Download PDF

Laura Cabral¹,
Bobby Stojanoski¹ &
Rhodri Cusack^1,2

1651 Accesses
2 Citations
Explore all metrics

Abstract

Humans have structures dedicated to the processing of faces, which include cortical components (e.g., areas in occipital and temporal lobes) and subcortical components (e.g., superior colliculus and amygdala). Although faces are processed more quickly than stimuli from other categories, there is a lack of consensus regarding whether subcortical structures are responsible for rapid face processing. In order to probe this, we exploited the asymmetry in the strength of projections to subcortical structures between the nasal and temporal hemiretina. Participants detected faces from unrecognizable control stimuli and performed the same task for houses. In Experiments 1 and 3, at the fastest reaction times, participants detected faces more accurately than houses. However, there was no benefit of presenting to the subcortical pathway. In Experiment 2, we probed the coarseness of the rapid pathway, making the foil stimuli more similar to faces and houses. This eliminated the rapid detection advantage, suggesting that rapid face processing is limited to coarse representations. In Experiment 4, we sought to determine whether the natural difference between spatial frequencies of faces and houses were driving the effects seen in Experiments 1 and 3. We spatially filtered the faces and houses so that they were matched. Better rapid detection was again found for faces relative to houses, but we found no benefit of preferentially presenting to the subcortical pathway. Taken together, the results of our experiments suggest a coarse rapid detection mechanism, which was not dependent on spatial frequency, with no advantage for presenting preferentially to subcortical structures.

Effects of Inversion and Fixation Location on the Processing of Face and House Stimuli – A Mass Univariate Analysis

Article 23 July 2024

The involvement of monocular channels in the face pareidolia effect

Article 16 December 2021

Fast temporal dynamics and causal relevance of face processing in the human temporal cortex

Article Open access 31 January 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Animals as diverse as fish, birds, and sheep can recognize the faces of their conspecifics (Leopold & Rhodes, 2010). In humans there has evolved a network of structures responsible for face processing that facilitates face detection, orientating, and identification (Haxby, Hoffman, & Gobbini, 2000; Mende-Siedlecki & Verosky, 2013; Tong, Nakayama, Moscovitch, Weinrib, & Kanwisher, 2000). This comprises subcortical components, including the superior colliculus and amygdala (Mende-Siedlecki & Verosky, 2013; Vuilleumier, Armony, Driver, & Dolan, 2003), and cortical components in the occipital and temporal lobes (Kanwisher, Mcdermott, & Chun, 1997; Kanwisher & Yovel, 2006; Pitcher, Dilks, Saxe, Triantafyllou, & Kanwisher, 2011). These specialized processing mechanisms allow faces to be detected more quickly than objects (Crouzet, Kirchner, & Thorpe, 2010) and result in faces being the first category to be detected in visual search tasks (Fletcher-Watson, Findlay, Leekam, & Benson, 2008). Detecting faces quickly is thought to be evolutionarily advantageous for both survival and social interaction, from the savannahs of Africa to the office party.

The subcortical route via the retinocollicular pathway to the amygdala is often thought to facilitate “quick and dirty” face detection (Johnson, 2005). It comprises projections from the retina to the superior colliculus, which in turn project to the pulvinar nucleus on the way to the amygdala (Benevento & Standage, 1983; Jones & Burton, 1976; Rafal et al., 2015; Tamietto, Pullens, De Gelder, Weiskrantz, & Goebel, 2012) Evidence that the retinocollicular pathway can process faces comes from blindsight patients, who after extensive damage to the visual cortex are still able to detect the emotional content of faces, although they cannot recognize their identity (Tamietto and de Gelder, 2010). Similar behavior is found in healthy controls following transracial magnetic stimulation to the visual cortex; when TMS prevents participants from seeing stimuli, they are still able to recognize the emotional content of the face (Jolij & Lamme, 2005). Furthermore, structures in the retinocollicular pathway are activated by the viewing of neutral and emotional faces, as shown with functional magnetic resonance imaging (fMRI) (Mende-Siedlecki & Verosky, 2013). Functional magnetic resonance imaging has also found that this pathway has a preference for crude, low-spatial frequency information, with greater activation to faces filtered to emphasize low spatial frequencies than high spatial frequencies (Vuilleumier et al., 2003).

Intracranial recordings in epilepsy patients have found that the retinocollicular pathway is fast, with neural firing in the amygdala as quickly as 100–250 ms after the presentation of an emotional face (Sato et al., 2013). Recent intracranial recording from Méndez-Bértolo et al. (2016) has found even faster processing for fearful faces, with firing in the amygdala recorded 74 ms after stimulus onset. Magnetoencephalography (MEG) data suggests even faster processing with responses to emotional faces detected in just 40 ms (Luo et al., 2010). Supporting this hypothesis, Garvert, Friston, Dolan, and Garrido (2014) used dynamic causal modeling of MEG data to conclude that a model with a subcortical component, containing the pulvinar nucleus and the amygdala, more accurately modeled rapid face processing than a model with a singular cortical process.

It has been proposed that these putative fast face detection mechanisms are not limited to subcortical structures, as there is also evidence of rapid mechanisms within cortical areas, such as the inferior occipital gyrus (Pitcher, Walsh, Yovel, & Aviv, 2007; Sadeh, Podlipsky, Zhdanov, & Yovel, 2010). Specifically, an initial feed-forward wave of firing through the cortex could allow for rapid, coarse processing (Cauchoix & Crouzet, 2013; Serre, Oliva, & Poggio, 2007; Vanrullen & Koch, 2001). Electroencephalography (EEG) data from the visual cortex can identify responses just 56 ms after stimulus onset (Foxe & Simpson, 2002), and intracranial recordings in epilepsy patients found that the category of image participants were viewing could be decoded from the first 100 ms of response in visual cortex (Liu, Agam, Madsen, & Kreiman, 2009). MEG data suggest occipitotemporal responses to faces in just 100 ms (Liu, Harris, & Kanwisher, 2002). Barragan-Jason, Cauchoix, and Barbeau (2015) have proposed that even the identification of familiar faces has an initial rapid phase, occurring at 140 ms, that depends on coarse visual information, and behavioral responses to familiar faces can be detected in just 180 ms (Visconti di Oleggio Castello & Gobbini, 2015). To formalize how the cortex could rapidly detect complex visual objects such as faces in real-world scenes, Thorpe and colleagues (Delorme & Thorpe, 2001; VanRullen, Guyonneau, & Thorpe, 2005) proposed a spike-based model of rapid processing. These models have been supported by recordings from V1 in the macaque and cat (Celebrini, Thorpe, Trotter, & Imbert, 1993; Konig, Engel, Roelfsema, & Singer, 1995; VanRullen et al., 2005).

In summary, although not without its critics, many authors have argued for both subcortical and cortical mechanisms for rapid visual processing of faces. Which one, therefore, dominates rapid face detection in healthy participants? One way to address whether rapid face perception is driven by subcortical structures is to target the retinocollicular pathway to the amygdala. Presenting stimuli exclusively to the nasal hemiretina preferentially targets the retinocollicular pathway, as the nasal hemiretina contains more fibers projecting to the superior colliculus. Initial evidence for this asymmetry came from tree shrews, cats, and macaques (Conley, Lachica, & Casagrande, 1985; Harrison, 2015; Perry & Cowey, 1985; Pollack & Hickey, 1979; Sterling, 1973). fMRI evidence in humans has demonstrated that the superior colliculus displays a temporal nasal asymmetry that is not found for the LGN or V1 (Sylvester, Josephs, Driver, & Rees, 2007). Additionally, behavioral studies have demonstrated that a nasal-temporal asymmetry is reflective of input to the superior colliculus. For example, making stimuli only visible to the S cones, which do not provide input to the superior colliculus, eliminates the benefit of presenting to the nasal hemiretina (Bertini, Leo, & Làdavas, 2008).

Our goals in this study were to establish a paradigm for behaviorally quantifying rapid face detection, and to determine whether presenting preferentially to the retinocollicular pathway resulted in improved rapid face detection. Participants were asked to detect faces from amongst unrecognizable control stimuli that were matched to have the same low-level visual features, as quantified with a model of the early visual system (Stojanoski & Cusack, 2014). To determine whether any rapid detection mechanism was specific to faces, we also tested a control condition, requiring detection of another class of visual object, houses.

Experiment 1

Methods

To probe rapid face processing, in two blocks participants performed a face-detection task in which they pressed a button as quickly as possible for intact faces, but not for scrambled foil stimuli. In two additional blocks, they were asked to detect houses in a similar manner. In each block, stimuli were presented monocularly, by asking participants to wear an eye patch. This allowed us to target stimuli exclusively to either the nasal or temporal hemiretina. In the right eye, presenting stimuli to the right of fixation targets the nasal hemiretina, while presenting to the left of fixation targets the temporal hemiretina. The opposite is true in the left eye. Within each block, stimuli were randomized across the nasal and the temporal hemiretinas.

Participants

Twenty-four individuals (12 males, 12 females, age range 18–21 years) were given course credit for participation in Experiment 1. The non-medical ethics board at the University of Western Ontario reviewed and approved the experimental protocol. All participants provided informed consent, reported normal or corrected-to-normal vision, and were right-handed.

Stimuli

Twenty-four face photographs from an online database (http://wiki.cnbc.cmu.edu/Face_Place) and 24 house stimuli, created by Martin, McLean, O’Neil, and Köhler (2013), were used in the study. As the house stimuli had a blurred edge, a custom MATLAB (Mathworks, Natick, MA, USA) script added a blurred edge to the face stimuli, so to appear similar by eye. As the house stimuli were grayscale, face stimuli were also altered to be grayscale.

All stimuli were centered in a rectangular area of 4.9° × 4.9° of visual angle. The fixation cross was .5° × .5°. A white background was used throughout the experiment. In all experiments, participants viewed the stimuli in a room with the lights on. To generate the control stimuli, faces and houses were diffeomorphically warped using the procedure described by Stojanoski and Cusack (2014). Foils were unrecognizable as determined by the behavioral ratings in Stojanoski and Cusack (2014) (image 38 on the diffeomorphic continuum). A depiction of the stimuli used in Experiment 1 can be found in Fig. 1A. Further details about the stimuli can be found in Supplementary Fig. 1.

Procedure

Stimuli were presented on a laptop screen using MATLAB and Psychtoolbox. Participants wore an eye patch to ensure monocular presentation, placed their heads on a chin rest, and were instructed to maintain fixation. The center of the screen was directly ahead of the nose. In each experimental block, a black fixation cross was offset by 3.2 cm to the left or right from center in order to put it directly in front of the unpatched eye. This distance was chosen using the mean interpupillary distance scores from the 1988 Anthropometric Army Survey.

In Experiment 1, participants completed two blocks with their left eye unpatched, one that contained only face targets, the other containing house targets, and two similar blocks with their right eye unpatched. Block order was counterbalanced across participants.

In each block, participants were presented with 96 trials comprising two repetitions of 24 target stimuli and their 24 warped counterparts. One repetition was presented to the nasal hemiretina, while the other was presented to the temporal hemiretina. To present to the nasal and temporal visual hemiretina, the stimuli were offset horizontally so that the outer edge of their rectangular bounds was 8° from the center of fixation. Stimuli were presented for duration of 122 ms, with an intertrial interval of 2,505 ms. Participants were instructed to perform a simple detection task, pressing a key a quickly as possible when they saw an intact face (in the face blocks) or an intact house (in the house blocks). For a schematic representation of the experimental configuration, please see Fig. 2.

Analysis

In order to quantify rapid processing, we used an analysis strategy similar to Kirchner and Thorpe's (2006) and calculated accuracy for the fastest 10% of responses. All reaction times (RTs) are relative to stimulus onset. A fast detection mechanism would be expected to improve accuracy on these rapid trials by providing more accurate information to decision and action areas sooner after stimulus onset. The RT threshold for the fastest 10% of trials was calculated for each participant individually, in order to account for individual differences in overall reaction time. We also expected that faces would be detected more quickly overall. If this is the case, to ensure that the overall difference in reaction time between the faces and houses did not drive the results, we adopted a conservative analysis strategy and determined the face and house reaction thresholds separately. Thus, the fastest 10% of face trials were expected to be even faster than the fastest 10% of house trials.

To determine the contribution of the retinocollicular pathway, we examined whether presenting the stimuli to the nasal or the temporal hemiretina modulated performance. As the nasal hemiretina has more connections to the superior colliculus and thus the retinocollicular pathway, we would expect to see faces more accurately detected than houses when the stimuli are presented to the nasal hemiretina.

Results

Two participants were excluded for failing to follow the task instructions. Across the remaining participants, mean RTs for both the fastest 10% and the slowest 50% of trials are shown in Fig. 3A. These reaction times include correct responses and false alarms, as both contributed to subsequent accuracy metrics.

To probe rapid mechanisms, analyses were confined to trials with a rapid response in the fastest 10% of RTs for each category. Participants were able to more accurately detect faces than houses (F(1,21)=10.41 p<0.01) (Fig. 4 A). This shows that our paradigm is sensitive to rapid, accurate face detection. We then turned to the effect of the retinal hemifield manipulation. There was no overall benefit of presenting stimuli to a particular hemiretina (F(1,21)= 3.87, p>0.05), suggesting no general role for the retinocollicular pathway in fast visual detection. Furthermore, contrary to what would be expected if the retinocollicular pathway was category selective, and supported rapid face detection, there was no significant stimulus by retinal hemifield interaction (F(1,21)=0.1, p>0.05) (Fig. 5A). In fact, there was a trend for better performance for faces in the temporal hemiretina.

Interim discussion

The results of Experiment 1 demonstrate that there is a rapid route for detecting faces that does not extend to other classes of stimuli (i.e., houses). As there was no benefit for presenting stimuli to the nasal hemiretina, the results of the experiment did not provide any evidence of a role for the retinocollicular pathway in rapid visual detection or rapid face processing. The lack of contribution from the retinocollicular pathway, taken with the trend for better processing in the temporal hemiretina, suggests that a cortical route could be responsible for the rapid face detection seen in the experiment.

Our next goal was to probe the specificity of the rapid pathway. A key feature of the rapid route discussed in the literature is that it is not just quick, but that it is dirty (i.e., a coarse representation). In an evolutionary context, it might be advantageous for neural structures to obtain extremely quick, coarse representations of the faces in the environment. This route is not thought to be capable of fine discrimination. Thus, the next experiment was designed to probe the precision of the rapid detection mechanism identified in Experiment 1.