Introduction

Attention is one of the most remarkable abilities of the human brain, ensuring that precious neural resources are focused on a subset of sensory items according to their intrinsic salience and their relationship to our current goals and needs. Most contemporary theories posit that attentional selection is the outcome of biased competition between preferentially enhanced activity elicited by attended objects and reduced activity to non-attended objects (Desimone and Duncan 1995; Kastner et al. 1998). Enhancement of activity by attention has been demonstrated in a number of human imaging studies, indicating that information processing of feature dimensions or entire visual objects are facilitated in the regions of extrastriate visual cortex that selectively process these same attributes (Corbetta et al. 1990; Haxby et al. 1994; O’Craven et al. 1999; Wojciulik et al. 1998). Reduction of activity through competition between objects for neural processing resources have been demonstrated using fMRI (Kastner et al. 1998).

On the other hand, special classes of visual stimuli, such as faces (Holmes et al. 2003; Vuilleumier 2000; Vuilleumier et al. 2001; Langton et al. 2008) and body parts (Downing et al. 2004), and the way these can express emotion, are more likely to be perceived than others, such as a house, under conditions of inattention or deficit in orienting attention. Brain imaging studies have shown greater activation in the fusiform gyrus to fearful faces than to neutral faces that were presented while subjects performed a house-matching task (i.e., while the faces were unattended (Vuilleumier et al. 2001)). This suggests that fearful faces recruit attention even when they are task-irrelevant and ignored. Therefore, while attention enhances activity elicited by objects selected in advance within the visual cortex, fearful faces, even when are not selected in advance, capture attention and evoke stronger responses in visual cortex than neutral faces do. Fearful faces and other privileged stimuli that capture attention automatically are usually associated with biologically important functions such as survival and/or their processing is an indispensable tool in social interactions. Few reports are available on attentional capture by stimuli that are not threat-related.

Studies of face perception with infants and adult males as subjects indicate the possibility that attention to the female face is another strong biological drive (Kawashima et al. 1999; Quinn et al. 2002; Tzourio-Mazoyer et al. 2002). Infants show attraction to face patterns within the first minutes of life, and they exhibit a preference for female faces, especially their mother’s face, at around 2 months of age (Tzourio-Mazoyer et al. 2002). This could make female faces the primary visual stimulus category to which humans are especially sensitive and such expertise could manifest in gender discrimination tasks. For males, an attentional preference for female faces could be advantageous in adulthood, as it would enhance correct identification of potential mates. Indeed, using eye-tracking methodology, Alexander and Charles (2009) have demonstrated that adult male subjects distribute more attention to female faces than to male faces. Female subjects, on the other hand, were found to pay equal attention to both male and female faces. These considerations support the speculation that female faces, or even faces of the gender opposite to that of the observer, capture attention. If such attentional capture exists, one may find its signature in modulations of behavioral performance and neuronal responses in specific brain regions.

In the current study, we investigated the hypothesis that opposite-gender face stimuli will be processed differently than same-gender stimuli when they are relevant to the task (because they capture attention), and responses to opposite-gender stimuli will be more difficult to suppress when they are task-irrelevant. At the behavioral level, we expect better performance (greater accuracy) when opposite—rather than same-gender stimuli are present. Because automatic capture of attention by opposite-gender stimuli would make them more resilient to suppression when they are task-irrelevant, we also expect the difference in evoked cortical activity by task-relevant and task-irrelevant stimuli to be greater for same-gender than opposite-gender faces. We expect these differences to be more pronounced in face-selective areas in the fusiform gyrus (FG).

To test these hypotheses, we first examined the effect of gender difference on facial perception at a behavioral level, using male and female face and hand stimuli. Given that the results showed the expected effect for only male subjects, we carried out magnetoencephalography (MEG) experiments using male subjects only, to assess the neuronal responses elicited by our stimuli at different attention levels, that is, when the faces or hands belonged to a task-relevant or a task-irrelevant category.

Materials and Methods

Behavioral Experiment

Subjects

Forty subjects (20 women and 20 men, mean age 26.1 ± 12.3) from the University of Western Sydney participated in the behavioral experiment. The study was approved by the Human Research Ethics Committee at the University of Western Sydney.

Stimulus and Task

Twelve grayscale images of male and female faces and hands were used as visual stimuli (Fig. 1a). Face images included neutral, positive, and negative expressions and were selected from a standardized stimulus set (Ekman and Friesen 1976). In a separate rating task, the gender of each face was identified using a five-point scale: 1 = male, 3 = not male and not female, and 5 = female. A one-sample t-test comparing the means of ratings in response to each female face (t(59) = −9.0, P < 0.005) and male face (t(59) = 6.4, P < 0.005) showed significant differences from the gender-ambiguous rating of 3, indicating that the gender of each face could be reliably identified. Due to the lack of standardized sets for hand gestures, a set of emblematic hand gestures was developed (Abrahamyan 2008). Hands were chosen as visual stimuli because they are biological and can be expressive, thus having both similarities and differences with faces. In a separate stimulus selection task, participants rated the hand stimuli on valence, arousal, and meaning to select the best exemplars. We selected the images showing the most neutral, positive, and negative hand gestures, which were symmetrical along the vertical axis. The selected neutral hand gestures portrayed loosely hanging hands. The selected positive stimulus was a “thumbs up” gesture. The negative hand gesture was one showing the “middle finger”. The selected hand gestures were performed by the same male and female actors.

Fig. 1
figure 1

Behavioral and MEG experiments. a Examples of face and hand stimuli. The horizontal and vertical dimensions of face stimuli in the behavioral and MEG experiment were equal. The spatial arrangement of left and right hands, however, was adjusted to make them comparable with the size of faces in the MEG experiment. b Experimental design. In alternate blocks of trials, subjects either discriminated target face gender or target hand gender. They counted the number of targets and reported this number (0, 1 or 2) at the end of the block. c Behavioral results. Male subjects were more accurate in responding to female faces than male faces (left); female subjects did not show a bias in gender-identification (right). In both groups, the difference in gender identification accuracy for male and female hands was not significant

The horizontal and vertical dimensions of the face stimuli subtended a visual angle of 8° and 10°, respectively. Hand stimuli subtended 13° horizontally and 8° vertically to the visual angle as measured on the largest hand gesture. The eccentricity was 10° of visual angle. The slight difference in size between faces and hands was not an important factor as they were treated as different conditions in the behavioural experiment. The size of hand gestures, however, was adjusted to be comparable with the size of faces when used in the MEG experiments (see stimulus and task for the MEG experiments).

The subjects sat in a comfortable chair approximately 62 cm from the monitor and their head was supported by a chin rest. The task was to count the number of target stimuli (female face or hand, or male face or hand). While stimuli varied systematically along the valence dimension, the task for subjects was to count the number of target hand or face stimuli of a certain gender. Target hands or faces could appear 0, 1, or 2 times in a block containing 8 stimuli (Fig. 1b). A block started with a task instruction (e.g. “Please count female hands”) for 2000 ms, followed by the display of eight stimulus items (both hands and faces) for 300 ms each, with a central fixation cross present throughout the block. Each stimulus item was presented in one of the quadrants one at a time with the inter-stimulus interval (ISI) varying randomly between 1800 and 2300 ms; subjects fixated a central fixation cross throughout the block. At the end of the block subjects were given 3000 ms to respond by pressing a key marked “0”, “1”, or “2” using their right hand. There were 24 zero-target blocks, 24 one-target blocks, and 36 two-target blocks in each face and hand condition.

Data Acquisition and Analysis

The experiment was programmed using DMDX stimulus presentation software (Forster and Forster 2003). Stimuli were displayed on a 19-in. (43-cm) CRT color monitor and the screen refresh rate was set at 80 Hz. To maintain vigilant central fixation, subjects were advised that their eye-movements were being monitored using a digital camcorder.

In agreement with signal detection theory, subjects’ responses were classified as follows. On a zero-target block, a “0” response was considered a correct rejection, whereas “1” and “2” were considered a false alarm (FA). On a one-target block, responding “1” was treated as a hit and “0” and “2” were considered a miss. In the case of two-target blocks, “2” responses were treated as a hit, whereas “0” and “1” were considered a miss. Although a “2” response on a one-target and a “1” on a two-target block could be treated as either a miss or FA, we have chosen to treat it as miss to counterbalance the proportion of FAs and misses, and to avoid biasing the results.

Standard d′ sensitivity measures, which are considered descriptive statistics in signal detection theory (Pastore et al. 2003), were computed for each condition and each individual independently in the case of one-target and two-target blocks. Proportions of the gender identification accuracy that had extreme values of either 0 or 1 were adjusted (Macmillan and Creelman 2005). As a result of this adjustment, the maximum possible d′ was 2.533, which corresponds to 100% gender identification accuracy.

MEG Experiments

Subjects

Seven healthy, right-handed Caucasian male volunteers participated for the two MEG experiments (mean age, 30.0 ± 5.0 years). All subjects were in good health with no past history of psychiatric or neurological disease, and all gave informed, written consent. The MEG protocol had been approved by the Research Ethics Committee of RIKEN.

Stimulus and Task

Each stimulus was presented in one of the quadrants at 10.7° eccentricity from fixation across the diagonal. Face and hand images were 8.2 × 6.1° and 6.7 × 5.3° in size, respectively. To define regions of interest (ROIs), we performed a separate MEG experiment (MEG Experiment 1). This experiment consisted of three runs for each location. Each run contained 30 faces, 30 hands and 10 shoes (not discussed in this paper), and the stimuli were presented in random order, each for 300 ms. In this experiment, subjects fixated on a central cross and responded with a fast button press to subtle changes of the fixation cross from white to black.

Similar paradigms were used for the main MEG (MEG Experiment 2) and behavioral experiments (see stimulus and task for the behavioral experiment). MEG Experiment 2 consisted of 8 runs, each containing 12 blocks of 13 stimuli each. A block started with a 1500-ms cue (e.g. the words, “male face”) defining what the target would be in the block. Each stimulus was presented for 300 ms with ISI randomly varying between 800 and 1200 ms. The task was to count the stimuli that matched the cue and to report this number, i.e., “0”, “1” or “2” at the end of each block by pressing the appropriate response buttons within 2000 ms. In each run, there were 6 targets and 72 non-targets for each face and hand stimuli. The non-target stimuli can be divided into two categories: “task-relevant” stimuli that are in the same category as the cue and “task-irrelevant” stimuli that are in a different category from the cue. Targets were used to establish the task, and they occurred in only about 8% of trials. To rule out any effects of counting, these trials were not used in the analysis of evoked fields.

Data Acquisition and Processing

Magnetic fields were measured at the Laboratory for Human Brain Dynamics, Brain Science Institute, RIKEN with the MEG system (Omega 151, CTF Systems Inc., Vancouver, BC, Canada) in a magnetically shielded room (MSR). The signals from all channels were digitized at a sampling rate of 625 Hz. Stimuli were presented on the screen by back-projection with a DLP projector (HL8000Dsx+, NEC Viewtechnology Ltd., Tokyo, Japan) located outside the MSR. The Presentation software (Neurobehavioral Systems, Inc., Albany, CA) controlled the projector with a 96 Hz refresh rate. The exact onset time of each stimulus was determined by luminance detection with a photodiode on the screen.

Environmental noise was attenuated by passive shielding, by the gradiometer geometry of the primary sensors and by application of the synthetic 3rd order gradient. Electrooculogram (EOG) and electrocardiogram (ECG) data were simultaneously recorded and trials with eye movements or blinks exceeding 50 μV EOG signal change during the stimulus presentation period were discarded. In addition, components strongly correlating with either EOG or ECG were identified by independent component analysis (ICA) and were removed. The MEG signal was filtered with a bandwidth of 3–200 Hz and with notches at 50 Hz, and its harmonics to eliminate power-line noise. The use of a 3 Hz high-pass filtering eliminates slow variations that, although interesting for their own sake, are not the subject of the current investigation. We note that the use of a 3 Hz high-pass filtering may cause some distortion of the signal and the regional activations derived from it, but they do not affect the comparison between conditions which is the main goal of the analysis.

The coordinates of MEG sensors were determined relative to the individual subject MRI for each run by the localization of fiduciary coils and our in-house co-registration procedure (Hironaga and Ioannides 2002).

The cleaned signal was then averaged separately for each condition within a run. Magnetic field tomography (MFT) (Ioannides et al. 1990; Taylor et al. 1999) was applied to each time slice (1.6 ms) of the averaged signal, leading to an independent tomographic estimate of neuronal activity. Instantaneous tomographic estimates were stored for later statistical analysis as a three-dimensional distribution of primary current density, J(r , t), in an array of grid points (17 × 17 × 17), covering the entire brain. For each grid point, the activity was defined by the direction of the current density and the modulus |J(r , t)|.

Statistical Parametric Mapping

For each subject, voxel-by-voxel statistical parametric mapping (SPM) analysis was performed using the data obtained in MEG Experiment 1. In this analysis, the Student’s t-test was used to compare the moduli of the estimated current density vector, |J(r , t)|, either between post- and pre-stimulus periods or between different conditions (e.g. face vs. hand). Window samples of all three runs for each condition were used for this comparison. In the first comparison, a baseline distribution was formed by random sampling from the pre-stimulus period (−250 to −50 ms) with 1.6 ms windows separated by at least 6.4 ms. This baseline distribution was contrasted with post-stimulus samples in either a 1.6 or 4.8 ms window, with center at a latency, T. After this comparison, the center of the window in the post-stimulus period was moved to (T + 1.6 ms) for the next comparison with the same baseline. The significantly activated areas computed with sampling windows of 1.6 ms were a little more compact, but broadly similar to the ones computed with sampling windows of 4.8 ms. In the second comparison, the two distributions to be compared were formed from samples within a 19.2 ms window, one from each condition and centered at the same latency, T. After this comparison, the centers of the windows for both conditions were moved to (T + 1.6 ms). Drawing samples from a range of latencies (window) rather at a single latency ensures that the results are robust, even when few single trials for each condition are available. If the smallest size window is used (i.e. a single time slice) then effects due to latency jitter will be missed, leading to “kinks” in the time-courses. A large window will avoid this problem at the expense of reduced time resolution.

Defining Region of Interest

We used anatomical criteria and the SPM results to define regions of interest (ROIs) with a radius of 10 mm for V1 and for the FG of each subject. Having fixed the ROI center, we used circular statistics (Fisher 1993; Ioannides et al. 2005) to define the “main direction,” i.e. the dominant direction of the MFT current density inside the given ROI.

We defined separate V1 ROIs for each position of the stimuli in the visual field (VF), and all agreed with the well-established retinotopic organization of V1 (Engel et al. 1997). For example, the MFT solutions for stimuli in the lower-right VF identified an ROI in the left dorsal bank of the calcarine fissure, with Talairach coordinates (Talairach and Tournoux 1988) corresponding to V1. The center and main direction of the V1 ROIs were determined from the MFT solutions in the range of 40 to 100 ms (Moradi et al. 2003).

Two ROIs for the FG—right and left—were defined for each left and right side of the peripheral location (always using the contralateral stimulus). To identify the spatial coordinate of common FG activations across subjects, we generated SPM for individual subjects by comparing pre- and post-stimuli and then these SPM were projected to the common Talairach space. ROIs in each individual MRI were confined to the identified coordinate and anatomical landmarks: the collateral sulcus and the temporal occipital sulcus. The direction of the current density was defined from the MFT solutions elicited by face stimuli for the ROI-defining runs in the 120–180 ms range (Okazaki et al. 2008).

Analysis of Regional Activity

We used an algorithm called localizing individual area neuronal activity (LIANA) to compute the activity in a ROI. LIANA is a three-step algorithm producing estimates of the time course of activity within a predefined ROI (Hironaga and Ioannides 2007). The first step of LIANA uses ICA to decompose the signal into independent components (IC) and their time-dependent weights. The second part of the algorithm extracts tomographic estimates of activity for each IC. In the final step of the algorithm, the ICs that do not belong to obvious noise artifacts and have high relative strength within a predefined ROI are identified and they are combined according to the time-dependent weights derived in the first step and the MFT estimates for the current density vector to extract the time course of activity in the given ROI for each single trial. Thus, LIANA allows reliable extraction of single-trial regional activations for one or more ROIs directly from raw MEG data without the need to do full MFT analysis. Specifically we applied LIANA to the unaveraged data to obtain single-trial estimates within the V1 and FG ROIs (defined from the SPM analysis of the average data). The results reported here are derived from the further analysis of these single trial regional activations.

Signal-to-Noise Ratio and Gender Index Analysis

For each stimulus condition and subject, the signal-to-noise ratio (SNR) (Laskaris and Ioannides 2001) was computed from the LIANA activations for V1 and FG with a 32.0-ms running window stepped every 1.6 ms. We emphasize that the SNR is computed at each latency t, from the pattern of activation across p time-slices with center the latency t, sample step τ, over n single trials. We make this explicit by writing SNR(t, p; τ) at each time window, centered at latency t, computed as the ratio between signal power (SP(t, p; τ)) and noise power (NP(t, p; τ)). The pattern of length p and centered at t of the ith single trial, X i (t, p; τ), is the p-dimensional vector with components the p signal values centered at t: \( X_{i} \left( {t,p;\tau } \right) = \left[ {x_{i} \left( {t - {\frac{p - 1}{2}}\tau } \right), \ldots,\;x_{i} \left( {t - \tau } \right),x_{i} \left( t \right),x_{i} \left( {t + \tau } \right), \ldots,\;x_{i} \left( {t + {\frac{p - 1}{2}}\tau } \right)} \right]. \) The equations defining the noise and signal power and the SNR are written below, in terms of the p-long patterns, one from each of the n single trials. We emphasize that these quantities depend explicitly on center latency, t, and the length of the segment p (and the sample step τ), inheriting this dependence from the dependence of X i (t, p; τ) on the same quantities. In the equations below, we use X i  = X i (t, p; τ), that is, we dropped from the notation the obvious dependence of the single trial segments on t, p and τ:

$$ \begin{gathered} {\text{NP}}\left( {t,p;\tau } \right) = {\frac{{\sum\nolimits_{i = 1}^{n} {\left\| {X_{i} - \bar{X}} \right\|_{{L_{2} }}^{2} } }}{{p\left( {n - 1} \right)}}},\quad {\text{SP}}\left( {t,p;\tau } \right) = {\frac{1}{p}}\left\| {\bar{X}} \right\|_{{L_{2} }}^{2} - {\frac{1}{n}}{\text{NP}}, \hfill \\ {\text{SNR}}\left( {t,p;\tau } \right) = {\frac{{{\text{SP}}\left( {t,p;\tau } \right)}}{{{\text{NP}}\left( {t,p;\tau } \right)^{\prime } }}},\quad \bar{X} = {\frac{{\sum\nolimits_{i = 1}^{n} {X_{i} } }}{n}} \hfill \\ \end{gathered} $$

where \( \left\| \cdot \right\|_{{L_{2} }}^{2} \) denotes the L 2 norm of a p-dimensional vector.

In summary, at each latency t, the NP is an estimate of the population variance and the SP is an estimate of the (noise-corrected) average power of the n single trial segments. The SNR is simply the ratio of SP to the NP. The instantaneous values of normalized SNR for each stimulus category were analyzed using ANOVA with Gender (Female, Male), Task (Relevant, Irrelevant), Hemisphere (left, right), and VF (upper, lower) as fixed factors, and Subject (seven subjects) as a random factor.

The measures we have introduced capture two kinds of variations. The first is “true” external noise introduced by environmental sources and the acquisition electronics. The second variation captured by the measures we have introduced is due to trial-to-trial variability in the source strength and latency jitter (or phase). We note in passing that the more widely used average signal and results using spatial filters also mix external noise and source profile variability, as any single-value measure would necessarily do. The use of the third gradient ICA identification and removal of artifacts reduces the first contribution substantially, so the “clean” signal and any measure derived from it reflects primarily properties of the underlying generators. These properties, as stated earlier, are of two distinct types. Even for the clean signal the SNR, the average signal and measures based on spatial filters are different ways of providing a single-value measure for at least two kinds of variations in the source response profile across trials. We chose to work with SNR, because these two variations are explicitly quantified by its defining parts, SP and NP: SP is a measure of the energy of the consistent responses, while NP is a measure of the variability in the energy of the remaining response. For the problem at hand, changes in SNR between female and male face stimuli can therefore be due to changes in either the trial-to-trial variability, i.e. NP, or the amplitude of the mean evoked response across trials, i.e. SP, or both. Therefore, we computed a Gender-Effect Index (GI) separately for each quantity and calculated correlations between GIs from SNR and NP, or SP, to determine the underlying cause for the gender effect in V1 and FG. Specifically, in reliable signals, i.e. (FSNR > 0) ∩ (MSNR > 0), we defined (GISNR/NP/SP) = (FSNR/NP/SP − MSNR/NP/SP)/(FSNR/NP/SP + MSNR/NP/SP), where FSNR/NP/SP and MSNR/NP/SP are the SNR, NP or SP for female and male faces, respectively. Positive index values indicate a stronger bias to female faces, whereas values near zero indicate no such bias or gender effect.

Results

Behavioral Results

Statistical analyses for d′ values were conducted separately for each male and female participant to assess gender identification accuracy in response to the opposite gender. Only one comparison showed significant differences (Fig. 1c): in the case of male subjects, the gender identification accuracy for female faces was significantly greater than that for male faces (t(119) = 3.1, P < 0.005, two-tailed paired t-test). There was no significant difference in gender identification accuracy in response to male and female hands (t(119) = 1.6, P = 0.105). In the case of female subjects, there was no significant difference in response to male and female faces (t(119) = 1.0, P = 0.328, two-tailed paired t-test) or male and female hands (t(119) = 1.5, P = 0.131).

MEG Results

ROIs for V1 and FG were defined by SPM with the pre-stimulus as baseline, in MEG Experiment 1 where subjects were engaged in a change detection task concerning the fixation cross while the stimulus was displayed in one of the four quadrants. The baseline SPM showed the earliest focal activity in V1 within 100 ms of stimulus onset at loci around the calcarine fissure, which agreed with the known retinotopy. A wide area was found to be activated by faces between 100 and 200 ms (Fig. 2a), the generally accepted face-specific component in electroencephalography (EEG) (N170) (Botzel and Grusser 1989) and MEG (M170) (Halgren et al. 2000; Liu et al. 1999) research. The direct SPM contrast between faces and hands identified a significant increase of activity for faces at the same locations as had been identified in the baseline comparison (Fig. 2b). The coordinates for the common V1 and FG ROIs are listed in Table 1, together with the Talairach coordinates for the FG obtained in other related fMRI studies.

Fig. 2
figure 2

a SPM in ventro-temporal areas as a function of time, computed from the contrast of distributions in the post- and pre-stimulus periods. The post-stimulus distribution at each latency T, was composed of the samples in a 4.8 ms window centered at T. The distribution of the pre-stimulus period (baseline) was formed by random sampling with 1.6-ms windows separated by at least 6.4 ms. These distributions were formed using samples from all three runs for each stimulus position on the visual field. The contours show significant change of activity induced by a face stimulus presented at the lower right (top) and lower left (bottom) for a single subject. b Face preferential activity and ROIs for each subject. Contours show statistically higher activity elicited by faces than hands for five of seven subjects (dash), and common to six of seven subjects (solid) around 140 ms. Blue small squares indicate the ROI center identified by the comparison between the pre- and post stimulus period for individual subjects

Table 1 Talairach coordinates x, y, z (mean ± SD) in mm for right and left fusiform gyrus ROIs defined by the SPM baseline comparison, and left/right-dorsal/ventral V1 ROIs defined by MFT solutions for peripheral presentations

We used the LIANA algorithm to extract activation curves for predefined ROIs, defined as described in “Materials and Methods” section, for left and right V1 and left and right FG. Figure 3 shows the grand average activation curves for Experiment 2, obtained by averaging across single trials and subjects, separately for each ROI and stimulus category. The figure shows some difference between female and male faces, but this difference is small and not easily disentangled from the rough time course, presumably caused by latency jitter and variability in the strength from trial to trial. As described in the “Materials and Methods” section, and in more detail elsewhere (Laskaris and Ioannides 2001), we refined the analysis by computing the SP, NP and SNR for each latency, t, using finite length segments of activation curves centered at t. Using segments of single trial data, rather than single time-points, makes the SNR less sensitive to the underlying source variation in power and phase, leading to a robust accurate measure even with a small number of single trials. The robustness of the SNR results is reflected in its time course, which in terms of smoothness is comparable to simple average measures obtained with many times the number of trials. In addition, by studying separately the SP and NP constituents of SNR, we can probe the underlying mechanisms of the source variation.

Fig. 3
figure 3

The time course for the grand average signal for V1 (top row) and FG (bottom row) ROIs, elicited by face (left column) and hand (right column) stimuli. Separate grand average curves are shown for male and female stimuli. The grand average time courses were computed from single trial activations extracted from the MEG signal using the LIANA algorithm

In MEG Experiment 2, we used the same (gender identification) task as in the behavioral experiment. Using the regional activity from the predefined ROIs, we first computed the instantaneous SNR for female and male face and hand stimuli and compared them when faces and hands were task-relevant or task-irrelevant. SNR for hand stimuli showed no gender bias either in V1 or in FG. In contrast, the SNR for face stimuli showed clear gender effects in V1 and FG. Specifically, the SNR for female face stimuli was significantly larger than the SNR for male face stimuli at three latency ranges, as shown in Fig. 4a: 71.0–82.0 ms (peaking at 79.0 ms, Gender, F(1, 6) = 8.2, P < 0.05), 124.0–135.0 ms (peaking at 132.0 ms, Gender × Task, F(1, 6) = 6.8, P < 0.05) and 218.0–246.0 ms (peaking at 226.0 ms, Gender × VF, F(1, 6) = 21.9, P < 0.005). Two post hoc comparisons at 226.0 ms using a Bonferroni-adjusted alpha of 0.025 showed that responses in V1 to female face stimuli are significantly higher than responses to male face stimuli in lower VF (t(27) = 3.0, P < 0.01), but not in upper VF (t(27) = 0.2, P = 0.854). FG also showed an interaction between Gender and Task at 124.0–140.0 ms (peaking at 135.0 ms, F(1, 6) = 6.8, P < 0.005) (Fig. 4b).

Fig. 4
figure 4

The averaged SNR curves from V1 (a) and FG (b). Comparison of the instantaneous amplitude for female and male faces (left) and hands (right) showed a significant gender effect at the gray shaded area. c Increase in response to female faces in V1 (top) and FG (bottom). As expected, the activity in response to male faces is significantly reduced in the task-irrelevant condition compared with the task-relevant condition. However, activity for the irrelevant female face is maintained at the same level as in the task-relevant condition. *** P < 0.005

For the interaction Gender × Task in each V1 and FG, four post hoc comparisons were conducted on the accuracy scores using a Bonferroni-adjusted alpha of 0.0125. The preference for female stimuli was observed in the task-irrelevant condition (two-tailed t-test, V1; t(27) = 3.3, P < 0.005, FG; t(27) = 4.3, P < 0.0005), but not in the task-relevant condition (V1; t(27) = −3.3, P = 0.739, FG; t(27) = 0.7, P = 0.505). As expected, the SNR in response to male faces was significantly attenuated in the task-irrelevant condition (V1; t(27) = 3.2, P < 0.005, FG; t(27) = 3.2, P < 0.005), compared to responses to male faces in the task-relevant condition (Fig. 4c). However, the SNR for female faces did not differ across task-irrelevant and task-relevant conditions (two-tailed t-test, V1: t(27) = −1.0, P = 0.317, FG: t(27) = −0.9, P = 0.387). The interaction Gender × Task identified in both V1 and FG shows that the presence of a female face captures the attention of male subjects even when a female face belongs to a category that is task-irrelevant.

We explored the SNR results further using gender index (GI) analysis (see Materials and Methods). The index distributions in Fig. 5 show that for both V1 and FG, 83.3% of the samples are robustly shifted to positive index values for SP (GISP), while no such gender bias was seen in GINP. Correspondingly, the correlation of gender indices is positive and statistically significant for SP (Pearson correlation, V1: r = 0.963, P < 0.001, FG: r = 0.964, P < 0.001) but not for NP (V1: r = −0.412, P = 0.090, FG: r = −0.177, P = 0.483). These results show that the increase in SNR for female faces is caused by an increase in amplitude of single trial activity (SP) rather than a decrease in trial-to-trial variability (NP).

Fig. 5
figure 5

Correlation between Gender-Effect Index for SNR (GISNR) and SP (GISP)/NP (GINP) in V1 (left figures) and FG (right figures). Positive values indicate a bias toward female faces (Dashed lines indicate the 95% confidence interval)

Pessoa et al. reported that attentional capture by emotional faces is possible only if enough attentional resources are available to process the faces (Pessoa et al. 2002). In our MEG Experiment 1, the subject’s attention is always sharply focused on the center fixation cross, leaving few processing resources for the irrelevant female and male stimuli that appear in the periphery. We applied the analysis used in MEG Experiment 2 to the data of MEG Experiment 1 to test whether the female face preference would persist. The results showed no female face preference either in V1 (Fig. 6a) or in FG (Fig. 6b). Moreover, the absence of a gender effect in MEG Experiment 1, especially in V1, rules out differences in physical features of stimuli, such as luminance, size, or contrast, as causes of the gender effect observed in the main MEG experiment.

Fig. 6
figure 6

The average SNR curves for each stimulus from V1 (a) and FG (b) in the first experiment. Female faces did not differ significantly from male faces in a comparison of the instantaneous amplitude of SNR

Discussion

We studied the processing of two categories of biologically important stimuli, faces and hands, when they belonged to a task-relevant or task-irrelevant category. In a behavioral experiment, subjects counted the number of gender-specific targets from face or hand categories according to a “cue” provided at the beginning of blocks of stimuli, and reported the number of targets at the end of each block. We found that male subjects were significantly more accurate in responding to female, compared with male, face target blocks. No corresponding effect was found for female subjects, i.e. females responded equally well to blocks where the targets were either male or female faces. No gender effect in response to hand stimuli was found for either male or female subjects.

Eye-tracking methodology demonstrated similar results, showing that adult male subjects distribute more attention to female faces than to male faces. Female subjects, on the other hand, pay equal attention to both male and female faces (Alexander and Charles 2009). This could be because attention of male subjects is automatically recruited by the presence of a female face so that turning the gaze towards the female face cannot be helped. It must be noted, however, that the accuracy data or eye tracking data do not allow us to study task-relevant and irrelevant responses separately, or to probe where and when in the brain the two responses might differ. As a result, involuntary attention is not the only possible interpretation; the contribution from voluntary attention by the “cue” cannot be completely excluded.

We explored the underlying mechanism further by adapting the experimental protocol for two MEG experiments with male subjects. In MEG Experiment 1, we defined ROIs for V1 and the face-selective FG brain areas for our subjects from the responses of stimuli presented in the contralateral VF, while subjects detected color changes of a central fixation cross. In MEG Experiment 2, the task required subjects to identify gender-specific targets from one category (face or hand) within a block of trials. The stimulus in each trial was therefore either task-relevant or irrelevant depending on whether it belonged to the target category. The time course of the evoked responses in V1 and FG ROIs was extracted for each single trial. It was thus possible to compare directly the responses evoked by task-relevant or irrelevant stimuli. The results demonstrated that the responses in both V1 and the FG were reduced for male faces when they were not relevant to the task. In contrast, female faces maintained a strong response for both task-relevant and task-irrelevant conditions, demonstrating that female face stimuli are more resilient to suppression than are male face stimuli.

The early stimulus gender effect in V1 and FG, identified in MEG Experiment 2, could be the result of earlier levels of processing rather than a preference for female faces per se, possibly caused by physical feature differences in the stimuli (e.g. luminance, size, contrast, etc.). We analyzed the difference between female and male faces for MEG Experiment 1, where subjects detected changes in a central fixation cross while stimuli were presented to the peripheral VF. There was no preference for female faces. This suggests that the significant difference between female and male faces observed in MEG Experiment 2, although likely to rely on feed-forward input via the ventral pathway (Mishkin et al. 1983), is not due to physical feature differences among the stimuli. Additionally, the absence of the female face preference in both V1 and FG in MEG Experiment 1 is in line with the results of Pessoa et al., and their interpretation that attentional capture requires sufficient attentional resources to process the faces (Pessoa et al. 2002). In other words, in MEG Experiment 1, the attention of a subject is always sharply focused on the center fixation cross, leaving only a few available resources for processing irrelevant female and male stimuli in the periphery.

The evidence of a female face preference for male subjects in behaviorial and MEG experiments leads us to believe that attraction for members of the opposite sex is involved. Indeed, a recent fMRI study showed increased activation for the sexually preferred category in specific brain regions in the thalamus and medial orbitofrontal cortex when heterosexual men and homosexual women responded to female faces, and heterosexual women and homosexual men responded to male faces (Kranz and Ishai 2006). This gender effect, related to sexual preference, was not found in face selective FG, despite the fact that FG was strongly activated by the stimuli. The absence of a gender effect in FG and V1 in the Kranz and Ishai fMRI study may either be because the modulation of the FG activity was too brief to produce sufficient hemodynamic response, or because the task did not draw attention away from the stimuli (i.e., ceiling effect). This latter interpretation is consistent with a gender effect being present in only the task-irrelevant condition, as seen here. The most parsimonious interpretation of this result is that male subjects process male and female faces similarly in task-relevant conditions but differently in task-irrelevant conditions. In the task-relevant condition, the added “weight” of an attended stimulus dominates contributions due to involuntary attention. In the task-irrelevant condition however, a female face automatically captures attention even when irrelevant, while a response to a male face is more effectively reduced by top–down inhibition. Moreover, absence of such gender effects when preoccupied with another demanding task suggests that male subjects can, but only with difficulty, selectively divert resources away from an irrelevant female face.

More work is needed to fully understand how male and female subjects respond under different attentional conditions to stimuli with faces of the same and opposite gender. For example, we would like to carry out further behavioral and MEG experiments with varying attentional demands, using both male and female subjects. Nevertheless, our results provide the beginning of a neural justification for the idea that males cannot concentrate fully on a task when a female face is present in the environment.