Introduction

Face perception is one of our most important and interesting functions. For example, a schematic drawing consisting of a circle, two dots, and a straight line is recognized as a face, although no individual component of the drawing is part of a real face. Since Bruce and Young (1986) proposed the “face recognition model”, there have been many studies of face perception using neuroimaging and electrophysiological methods (e.g., Bentin et al. 1996; George et al. 1996; Haxby et al. 1996, 1999; Honda et al. 2007; Itier and Taylor 2004a; Itier et al. 2006; Kanwisher et al. 1998; Latinus and Taylor 2006; Miki et al. 2007; Rossion and Jacques 2008; Sagiv and Bentin 2001; Shibata et al. 2002; Watanabe et al. 1999b, 2002, 2003, 2005). In studies of event-related potential (ERP) using averaging electroencephalography (EEG) (e.g., Bentin et al. 1996; George et al. 1996), static human faces evoked a negative potential in the bilateral occipito-temporal areas peaking at around 170 ms, termed N170, which is considered to be sensitive to faces.

N170 is affected by face inversion and the scrambling of facial features. In previous studies using inverted faces (Bentin et al. 1996; Honda et al. 2007; Itier and Taylor 2004a; Itier et al. 2006; Latinus and Taylor 2006; Sagiv and Bentin 2001; Watanabe et al. 2003, 2005), faces with scrambled features (George et al. 1996; Latinus and Taylor 2006), and individual components such as the eyes and nose (Itier and Taylor 2004a, Itier et al. 2006; Shibata et al. 2002; Watanabe et al. 1999a), the N170 was longer in latency for inverted faces than for upright faces, regardless of the same low-level properties, such as luminance. These results indicate that N170 is related to differences in higher-level processing rather than changes in luminance and that its latency is affected by whether a subject easily and quickly detects a stimulus as a whole face or not.

However, there are several problems with studies of ERP as follows: (1) Since cortical activity interferes with the change in luminance accompanying the presentation of the visual stimulus, cortical activity related to visual shape perception (e.g., a circle, letter, or schematic face) processing can not be completely separated from strong and long-lasting activities in the primary visual areas. For example, activities in the primary visual areas are evoked by the change in luminance, regardless of the stimulus shape. (2) The generator source of N170 has not yet been confirmed, although some regions such as the superior temporal sulcus (STS), inferior temporal (IT) area, and other surrounding regions are candidates, as reported in previous studies (Bötzel et al. 1995; Herrmann et al. 2005; Itier and Taylor 2004c; Mnatsakanian and Tarkka 2004; Pizzagalli et al. 2002; Rossion et al. 2003; Schweinberger et al. 2002; Shibata et al. 2002; Taylor et al. 2001; Watanabe et al. 2003); however, N170 could be contaminated with activity generated in the primary visual cortex, as described above. The source of M170 in previous MEG studies is considered to be only the IT area, since only the tangential dipole can be recorded by MEG (Deffke et al. 2007; Halgren et al. 2000; Linkenkaer-Hansen et al. 1998; Sams et al. 1997; Swithenby et al. 1998; Tanskanen et al. 2005; Watanabe et al. 1999a, b).

It is very difficult to distinguish the activity in the IT area related to face perception processing from that in the primary and other higher visual areas, for example, STS. Certainly, there are various methods to remove primary visual area activity in previous studies, such as subtracting the ERP to phase-scrambled faces from normal faces (Rousselet et al. 2005, 2007a, b) or alternating face identity without a blank screen inserted (Jacques and Rossion 2006); however, the sources of these components in previous studies have not been elucidated and remain unknown. In contrast, the new visual stimulus method in this study, random dots blinking (RDB), uses temporal changes in patterns of a large number of small dots to present stimuli without a change in luminance, during the presentation of an object (e.g., a circle, a letter, or a schematic face), and consequently reduces activities in the primary and other higher visual areas and detects only activity in the IT area (Okusa et al. 1998, 2000; Noguchi et al. 2004; Noguchi and Kakigi 2005, 2006; Noguchi et al. 2007). In addition, the subtraction method may have a risk of showing artificial activity or reducing real activity. Therefore, in this study, we used the RDB method to investigate the activity in the IT area related to face perception processing (Fig. 1).

Fig. 1
figure 1

Visual stimuli presented by using random dots blinking (RDB). The method employs a series of dots patterns that appear to be blinking randomly. For the visual stimuli, the dots within the white lines remain still while those outside move and therefore the subject perceives a shape. In this study, four stimuli were presented: (1) Upright: a schematic face consisting of a facial contour, eyes, and a mouth, (2) Inverted: the Upright stimulus inverted, (3) Scrambled: the same contour and features as in Upright and Inverted but with the spatial relation distorted, and (4) Star: a star shape whose occurrence was enumerated and then reported after sessions by the subjects

As already mentioned, it is interesting that a simple schematic drawing consisting of a circle for a facial contour, two dots for eyes, and a straight line for a mouth is detected as a face (Fig. 1). However, when the spatial relation among the individual components is distorted, a face can no longer easily be detected. Previous studies reported that the component N170 was evoked by schematic faces as well as photographs of real faces (Latinus and Taylor 2006; Sagiv and Bentin 2001). In the present study, we used a schematic face, which subjects detect as a face, which is easily drawn by the RDB method.

The main objectives of this study were to investigate (1) how a component, which reflected only activity in the IT area related to face perception, was evoked by upright, inverted, and scrambled faces using RDB (Fig. 1), (2) whether configural distortions of the face affect the component, and (3) which process the component reflects in the face perception. We compared ERPs evoked by an upright schematic face (with the contour and features of a face) with those evoked by an inverted schematic face and by a scrambled face in which the spatial relation among the contour and features is distorted, although the elements were the same among conditions (Fig. 1).

Methods

Subjects

We studied ten right-handed volunteers (3 females, 7 males, mean; 34.5 years old) with normal or corrected visual acuity. All subjects gave informed consent to participate in the experiment, which was approved by the Ethics Committee of the National Institute for Physiological Sciences.

Visual stimuli

We presented visual stimuli using RDB (Okusa et al. 1998, 2000; Noguchi et al. 2004; Noguchi and Kakigi 2005, 2006; Noguchi et al. 2007). With this method, white dots were presented on a black background (6.2 × 6.2 degrees), arranged in 256 rows and 256 columns. Every 16.7 ms, each dot turned white or black according to a pseudo-random plan, so as to present a sequence of random dot patterns, i.e. the white dots seemed to move randomly. The ratio of the white to black dots was fixed at 1:3 during the presentation to keep the mean luminance of the field constant. During the display of the stimuli, the dots forming stimuli were kept unchanged (stopped within the white line in Fig. 1), whereas the other dots were changed (moved outside the white line in Fig. 1) to bring about segregation between stimuli and the surroundings and therefore subjects perceived a shape of the stimuli without a change in luminance (Fig. 1). By using this method, the activities evoked by the change in luminance of the primary visual areas are markedly inhibited or disappear (see Okusa et al. 1998; Noguchi et al. 2004, 2007).

The stimuli appeared in the center of the random dots field and then disappeared. Four stimuli were presented as follows (Fig. 1):

  1. (1)

    Upright: a schematic face consisting of a large circle, two dots, and a straight line. Subjects described that it looked like a face with a contour, eyes, and a mouth.

  2. (2)

    Inverted: the Upright stimulus was inverted, but the spatial relation between the contour and features was preserved.

  3. (3)

    Scrambled: the spatial relation between the contour and features was distorted. The Scrambled stimulus was vertically and horizontally symmetrical, because subjects could judge neither upright nor inverted

  4. (4)

    Star: subjects were asked to count the occurrence of Star (Fig. 1) and to report the number after sessions to draw their attention to this stimulus and to avoid paying attention to Upright, Inverted, and Scrambled. The shape of the Star was very different from the shapes of the other stimuli.

Upright, Inverted, and Scrambled were made from only one kind of schematic face and only one type of each stimuli was used in Fig. 1.

In all conditions, a random dot pattern was shown for 1,500 ms and then the stimulus immediately appeared for 500 ms (Fig. 1). Each stimulus was randomly presented by a personal computer (PC, IBM) and cathode ray tube (CRT) screen. The distance between the subject’s eyes and the CRT screen was 120 cm. Stimuli were projected centrally in the visual presentation field. Subjects were asked to focus on a small red cross at the center of the stimulus during the experiment. To be familiar with each RDB image and experimental paradigm, several trials were executed before EEG recordings for all the subjects.

EEG recordings

We recorded EEG using the standard 19 electrodes of the 10–20 system (Fp1, Fp2, F3, F4, F7, F8, Fz, C3, C4, Cz, T3, T4, T5, T6, P3, P4, Pz, O1, O2) with Ag/AgCl disk electrodes (Fig. 2a). The electrode impedance was kept below 5 kΩ. The reference was placed at the tip of the nose.

Fig. 2
figure 2

a Locations of the 19 electrodes. b The topographical map of the grand-averaged ERP evoked by the Upright, Inverted, Scrambled, and Star conditions averaged in the time frame of 235–270 ms

We used Neurofax (Nihon-Kohden, Tokyo, Japan) for recording the data and EPlyzer (Kiseei Comtec, Nagano, Japan) for analyzing the data. The bandpass filter was 0.1–80 Hz which did not smear the components and the sampling rate was 1,000 Hz. Eye movements of subjects were monitored during EEG recordings with horizontal and vertical electrooculograms (EOGs). Epochs in which signal variations of EEG and EOG were larger than ±50 μV were excluded from the averaging. To minimize subject habituation and drowsiness, we used short-term recording sessions for each subject. Each recording session included 40 trials for the four stimulus conditions, i.e., 10 trials for each condition. It took 80 s for each session. The number of trials averaged per subject was more than one hundred for each condition. The number of sessions was 14 for one subject and 15 for nine subjects. The window of averaging was 100 ms before and 500 ms after the stimulus onset. A 100 ms pre-stimulus baseline was used for responses to the stimulus.

Data analysis

In a previous study using RDB (Okusa et al. 1998), a clear component peaking at around 200–300 ms was identified. In this study, we identified a component according to the following criteria; peak deflection lies 200–300 ms after stimulus onset and significant deflection [>2 standard deviation (SD) of the fluctuation level in the baseline period of each electrode]. The peak latency when the amplitude of an ERP component was maximal for each condition after the stimulus onset, and the maximum amplitude of the component, were individually measured within 300 ms. For the peak latency and maximum amplitude at the T5, T6, P3, P4, O1, and O2 electrodes, where a negative component was identified within 300 ms after the stimulus onset, a repeated measures two-way factorial ANOVA was performed within subjects with condition (Upright, Inverted, Scrambled, and Star) and electrode (T5, T6, P3, P4, O1, and O2) as factors. For the peak latency and maximum amplitude at Cz and Fz, where a positive component was identified within 300 ms after the stimulus onset, a repeated measures two-way ANOVA was performed within subjects with condition (Upright, Inverted, Scrambled, and Star) and electrode (Cz and Fz) as factors. p < 0.05 was considered significant. If the assumption of sphericity was violated in Mauchly’s sphericity test, the degree of freedom was corrected using Greenhouse-Geisser’s correction coefficient epsilon, and then F and p values were recalculated. The Tukey–Kramer multiple comparison test was used for post hoc analysis and p < 0.05 was considered significant.

In the source analysis for grand-averaged ERPs, we used a multi-dipole, brain electric source analysis (BESA) (MEGIS Software, Germany) computation of theoretical source generators in a four-shell (brain, cerebrospinal fluid, bone, and scalp) ellipsoidal head model for EEG (Berg and Scherg 1994) within 200–300 ms. We first tried to estimate the dipole freely but could not obtain a physiologically plausible location. In previous RDB studies, the sources estimated from the evoked component were located in bilateral IT areas (Noguchi et al. 2004; Noguchi and Kakigi 2005, 2006; Noguchi et al. 2007). Taking the results of previous RDB studies (Noguchi et al. 2004; Noguchi and Kakigi 2005, 2006; Noguchi et al. 2007) into account, we calculated a two-source model for grand-averaged ERPs: (1) the right IT area and (2) the left IT area. Initially, we simultaneously placed each of the sources around the corresponding region. The BESA calculation allows some change in the initial location and freedom in the orientation of each source, so it is possible for each source to move to a nearby location, if a better fit to the data results.

Results

Figures 3 and 4 show the grand-averaged waveforms of ERPs after the stimulus onset. There was no component peaking at around 100 ms like P1 or P100, which is considered to reflect the activities of the primary visual areas in ERP studies (Figs. 3, 4). In addition, the N170 component of ERP, which is considered to be face-specific, was not identified (Figs. 3, 4). Instead, clear components peaking at around 250 ms, N-ERP250 and P-ERP250, were identified for all conditions. For the peak latency at the T5, T6, P3, P4, O1, and O2 electrodes, the two-way ANOVA showed the effect of condition (condition: F(3,27) = 6.947, p < 0.01) and there were no interactions. For maximum amplitude, the two-way ANOVA showed effects of condition and electrode (condition: F(3,27) = 6.215, p < 0.01; electrode: F(5,45) = 10.740, p < 0.01) and there was a significant condition–electrode interaction (F(15,135) = 5.925, p < 0.01).

Fig. 3
figure 3

The grand-averaged waveforms of ERP evoked by the Upright, Inverted, Scrambled, and Star conditions at T5 and T6 electrodes. To show each waveform and the differences among waveforms more clearly, grand-averaged waveforms were filtered (0.1–30 Hz). Blue line is Upright, green is Inverted, red is Scrambled, and light blue is Star

Fig. 4
figure 4

The grand-averaged waveforms of ERP evoked by the Upright, Inverted, Scrambled, and Star conditions at P3, P4, O1, O2, Cz and Fz. The legend is identical to that of Fig. 3

T5 and T6 electrodes (temporal area)

A clear negative component peaking at around 250 ms, N-ERP250, was identified at both the T5 and T6 electrodes (Fig. 3).

At the T5 and T6 electrodes, the latency of N-ERP250 was significantly longer for Inverted (T5: p < 0.01, T6: p < 0.05) and Scrambled (T5: p < 0.01, T6: p < 0.05) than for Upright (Fig. 3; Table 1).

Table 1 The latency and amplitude of N-ERP250 at T5 and T6 (the temporal area)

At the T5 electrode, maximum amplitude was larger for Upright, Inverted, and Scrambled than for Star, although there were no significant differences among the conditions (Fig. 3; Table 1). On the other hand, at the T6 electrodes, maximum amplitude was significantly larger for Upright, Inverted, and Scrambled than for Star (Upright: p < 0.05, Inverted, and Scrambled: p < 0.01) (Fig. 3; Table 1).

P3 and P4 electrodes (parietal area)

N-ERP250 was also identified at both the P3 and P4 electrodes (Fig. 4). At the P3 electrode, the peak latency of N-ERP250 was significantly longer for Scrambled and Star than for Upright (p < 0.05). At the P4 electrode, the latency was significantly longer for Inverted than for Upright (p < 0.05) (Fig. 4; Table 2).

Table 2 The latency and amplitude of N-ERP250 at P3 and P4 (the parietal area)

For maximum amplitude, there were no significant differences among the conditions at the P3 electrode. On the other hand, at the P4 electrode, maximum amplitude was significantly larger for Scrambled than for Upright (p < 0.01), Inverted (p < 0.05) and Star (p < 0.05) (Fig. 4; Table 2).

O1 and O2 electrodes (occipital area)

N-ERP250 was also identified at both the O1 and O2 electrodes (Fig. 4). At the O1 and O2 electrodes, there were no differences in peak latency among the conditions (Fig. 4; Table 3). But, maximum amplitude was significantly larger for Scrambled than for Upright (p < 0.05) and Star (p < 0.01), and for Inverted than for Star (p < 0.05) at both electrodes (Fig. 4; Table 3). Moreover, at the O2 electrode, maximum amplitudes for all conditions were larger than at the O1 electrode (Upright, Inverted, and Star: p < 0.05, Scrambled: p < 0.01) (Fig. 4; Table 4).

Table 3 The latency and amplitude of N-ERP250 at O1 and O2 (the occipital area)
Table 4 The latency and amplitude of P-ERP250 at Cz and Fz (the vertex and frontal area)

Cz and Fz electrodes (vertex and frontal area)

There was a positive signal at both the Cz and Fz electrodes peaking at around 250 ms, P-ERP250, after the stimulus onset (Fig. 4; Table 4). There were no differences among the conditions in the peak latency but, for maximum amplitude, the two-way ANOVA showed effects of condition (condition: F(3,27) = 3.672, p < 0.05) at the Cz and Fz electrodes (Fig. 4; Table 4). Maximum amplitude was larger for Upright than for Star at both electrodes (p < 0.05) (Fig. 4; Table 4).

The differences among temporal, parietal, and occipital areas

In the left hemisphere, for the Upright and Inverted conditions, maximum amplitude was significantly larger in the occipital area (O1) than parietal area (P3) (Upright: p < 0.01, Inverted: p < 0.01). For Scrambled and Star, maximum amplitude was significantly larger in the occipital area (O1) than temporal area (T5) (Scrambled and Star: p < 0.01) and parietal area (P3) (Scrambled: p < 0.01, Star: p < 0.05).

In the right hemisphere, the maximum amplitude in the occipital area (O2) was significantly larger than that in the temporal (T6) (Upright: p < 0.05, Inverted: p < 0.05, Scrambled: p < 0.01, Star: p < 0.01) and parietal (P4) (Upright: p < 0.01, Inverted: p < 0.01, Scrambled: p < 0.01, Star: p < 0.01) areas for each condition.

Topographical maps

Figure 2b shows topographical maps of the grand-averaged ERPs for all conditions averaged for the time frame of 235–270 ms. In all conditions, the maps showed negativity in the bilateral occipito-parietal areas and positivity in the vertex and frontal areas. The maps for Upright, Inverted, and Scrambled were more negative in the bilateral temporal areas than the map for Star, while the map for Scrambled was more negative in the bilateral occipito-parietal areas than for the maps for Upright and Inverted. The map for Upright was more positive in the vertex and frontal area than the map for Star.

N-ERP250 and P-ERP250 source analysis

We estimated the sources of N-ERP250 and P-ERP250 from the grand-averaged ERPs within 200–300 ms by using BESA as mentioned in Methods” (Fig. 5) and the goodness-of-fit (GoF), which indicated the percentage of the data that can be explained by the model. A two-source model showed very high and reliable GoF values in all the conditions (Upright: GoF = 96.3%, Inverted: GoF = 96.4%, Scrambled: GoF = 97.6%, Star: GoF = 97.0%). Adding other sources, for example, in the occipital regions, resulted in little increase in the GoF value.

Fig. 5
figure 5

The N-ERP250 and P-ERP250 sources estimated from grand-averaged waveforms of Upright overlaid on standardized MR images (axial and coronal images). The red circle is the source estimated in the right inferior temporal (IT) area and the blue circle is the one in the left IT area. Those for Inverted, Scrambled and Star were estimated at almost the same location. Their Talairach spaces were as follows: Upright: right: x = 32.8 mm, y = −56.7 mm, z = −9.7 mm; left: x = −24.9 mm, y = −58.6 mm, z = −9.2 mm, Inverted: right: x = 32.1 mm, y = −60.6 mm, z = −10.1 mm; left: x = −23.6 mm, y = −65.6 mm, z = −10.0 mm, Scrambled: right: x = 31.4 mm, y = −62.2 mm, z = −10.3 mm; left: x = −22.2 mm, y = −60.1 mm, z = −10.1 mm, and Star: right: x = 26.3 mm, y = −67.0 mm, z = −13.1 mm; left: x = −26.4 mm, y = −68.5 mm, z = −6.3 mm

The estimated sources for each condition were located in the IT area and their Talairach spaces were as follows (Fig. 5): Upright: right: x = 32.8 mm, y = −56.7 mm, z = −9.7 mm; left: x = −24.9 mm, y = −58.6 mm, z = −9.2 mm, Inverted: right: x = 32.1 mm, y = −60.6 mm, z = −10.1 mm; left: x = −23.6 mm, y = −65.6 mm, z = −10.0 mm, Scrambled: right: x = 31.4 mm, y = −62.2 mm, z = −10.3 mm; left: x = −22.2 mm, y = −60.1 mm, z = −10.1 mm, and Star: right: x = 26.3 mm, y = −67.0 mm, z = −13.1 mm; left: x = −26.4 mm, y = −68.5 mm, z = −6.3 mm. Figure 5 shows the estimated sources for the Upright condition overlaid on standardized MR images and those for the Inverted, Scrambled, and Star conditions were estimated at almost the same location. They appear to be located in the fusiform gyrus of the bilateral hemispheres.

Discussion

In a previous study with magnetoenchepharography (MEG) Okusa et al. (1998) found that one clear component peaking at ~300 ms, was evoked by a letter or character using RDB, which can reduce the activities in the primary visual areas, and its source was estimated to be located in the IT area. In fact, all previous RDB studies reported that there was no significant response of the primary visual areas but that a clear component was evoked by the activity of the higher visual area related to the perception of visual shapes, for example a letter or character (Okusa et al. 1998, 2000; Noguchi et al. 2004; Noguchi and Kakigi 2005, 2006; Noguchi et al. 2007). Therefore, they speculated that the primary visual areas were continuously activated by RDB, and so a time-locked component was not evoked. In contrast, a clear component generated in higher visual areas was time-locked at the time when the subject detected what the visual stimulus was, for example, a face in the present study. In addition, a recent fMRI study showed that the activities in the lower visual areas (e.g., Broadmann area 17/18) were greatly attenuated in the RDB compared to luminance-defined stimuli, so that neural responses in the higher visual regions such as the fusiform gyrus were mostly identical (Noguchi et al. 2007). In the present study, clear components, N-ERP250 and P-ERP250, were evoked by all conditions and their sources were estimated to be in bilateral IT areas around the fusiform gyrus as in previous RDB studies (Okusa et al. 1998, 2000; Noguchi et al. 2004; Noguchi and Kakigi 2005, 2006; Noguchi et al. 2007) and M170 in MEG studies (Deffke et al. 2007; Halgren et al. 2000; Swithenby et al. 1998; Tanskanen et al. 2005; Watanabe et al. 1999a, b). Moreover, there was no positive component like P1 or P100, which is considered to be evoked by various visual stimuli about 100 ms after stimulus onset and to reflect the activities of the primary visual areas, and there was no negative component like N170, which is considered to be evoked by face stimuli about 170 ms after stimulus onset and to reflect the activities of higher visual areas, for example, the STS area and IT area (Watanabe et al. 2003; Itier and Taylor 2004c) in ERP studies. But a clear negative component, N-ERP250 at the O1 and O2 electrodes, was observed and its source was estimated to be in the bilateral IT areas. These results were compatible with those of previous RDB studies.

One previous study using intracranial ERP reported that the first distinct response in visual ventral and lateral areas appears ~200 ms after stimulus onset (Allison et al. 1999). In contrast, N-ERP250 and P-ERP250 in this study were observed at a latency of 250 ms, about 50 ms later than in intracranial ERP. In a previous MEG study, Schoenfeld et al. (2003) used two types of stimuli, visual shape stimuli formed by luminance change and formed by motion coherence of random dots, to observe the processing of a visual form defined by luminance or motion cues. They then reported that shape stimuli defined by luminance change activated the lateral occipital (LO) and IT areas directly from the primary visual area, for example V1, but shape stimuli defined by motion coherence of random dots activated LO and IT areas indirectly from V1 and after MT/V5 activity. As a result, the response latency in LO and IT was delayed for 50–60 ms in response to motion-defined rather than luminance-defined shapes, despite the same response latency in early visual areas. Therefore, it is possible for RDB stimuli to be processed in several areas (in the dorsal stream) before reaching the IT area, resulting in some delay of activity in the visual ventral pathway, although it is unclear where our RDB stimuli are processed between V1 and IT, as mentioned in a previous RDB study (Noguchi et al. 2004). In addition, the duration of these components in our study is much wider than N170, and this may be due to latency jitter between trials and/or subjects.

In the current study, (1) the amplitude of N-ERP250 was significantly larger for Upright than for Star at the T6 electrode, and (2) the amplitude of P-ERP250 was significantly larger for Upright than for Star at the Cz and Fz electrodes. In previous EEG studies, N170 and the positive component at Cz, vertex positive potential (VPP), were larger for face then non-face, for example car (e.g., Rossion and Jacques 2008). These results were compatible with the previous studies. Therefore, we speculated as follows: (1) N-ERP250 and P-ERP250 appear to reflect activity mainly in the IT area, and (2) N-ERP250 amplitude differences appear to reflect differences between face perception and non-face perception processes.

There was the question of whether Inverted and Scrambled in this study were perceived as face or non-face. However, the amplitude of N-ERP250 for Inverted and Scrambled was almost the same as that for Upright but significantly larger than that for Star, which was clearly non-face, at the T6 electrode as in previous studies (e.g., Rossion and Jacques 2008). Therefore, Inverted and Scrambled might have been perceived as a kind of face rather than non-face.

P1 or P100 is considered to be evoked by various visual stimuli about 100 ms after stimulus onset and to reflect the activities of primary visual areas. Recent studies suggested that the positive component (P1 or P100) also reflects face-specific processing and is susceptive to face inversion (Doi et al. 2007; Thierry et al. 2007); therefore, we could not exclude the possibility that N-ERP250 and P-ERP250 are contaminated by activities in low-order visual areas. However, the source of N-ERP250 and P-ERP250 was estimated to be in the bilateral IT, and moreover, the GOF of the estimated dipole model was very high and reliable. In addition, our previous studies using RDB stimulus did not show activity in low-order visual areas but only in IT areas (Okusa et al. 1998, 2000; Noguchi et al. 2004; Noguchi and Kakigi 2005, 2006; Noguchi et al. 2007); therefore, we thought that the major source of N-ERP250 and P-ERP250 was the IT area and these components mainly reflected IT activity.

In previous ERP studies, the peak latency of N170 was significantly longer for configural distortions [inverted faces (Bentin et al. 1996; Honda et al. 2007; Itier and Taylor 2004a; Itier et al. 2006; Sagiv and Bentin 2001; Watanabe et al. 2003, 2005), Thatcherized faces (Carbon et al. 2005), and scrambled faces (George et al. 1996)] than for upright faces. In addition, Latinus and Taylor (2006) found that photographic and schematic faces evoked a similar N170 when the upright face was presented, but the N170 was increased in latency when the inverted face was presented. They speculated that photographic and schematic faces initially engaged first-order relation (eyes above nose, above mouth) processing followed by holistic processing and; moreover, that the involvement of additional analytic processing (extraction of detailed information, which was invoked for a feature by feature analysis of stimuli within the context of facial configuration) would produce the delayed N170s for inverted photographic and schematic faces. Considering the findings mentioned above, the delay of peak latency for Inverted and Scrambled compared to Upright in this study indicated that the involvement of additional analytic processing for Inverted and Scrambled stimuli would produce the delayed N-ERP250. Even if the spatial relation between the facial contour and features was intact, the configural distortion may engage additional analytic processing of face perception in the IT area. However, the amplitudes at T5 and T6 electrodes were larger for Inverted and Scrambled than for Upright, although there were no significant differences among Upright, Inverted and Scrambled, different from previous N170 studies (George et al. 1996; Rossion et al. 2003). Therefore, we considered that amplitude differences among Upright, Inverted and Scrambled conditions were weak compared with previous ERPs, probably because of a latency jitter between trials and/or subjects, although this cannot be confirmed from the present study.

However, compared with the N170 evoked by schematic faces and various face stimuli in previous studies (Jacques and Rossion 2006; Latinus and Taylor 2006; Sagiv and Bentin 2001), the N-ERP250 showed several differences as follows: (1) The N-ERP250 in this study had a longer latency than the N170 in previous studies and moreover no negative component about 170 ms after stimulus onset appeared in this study, although schematic faces were used in all these studies. (2) In a previous ERP study using a continuous-stimulation paradigm, in which the low-level features remained constant, leading to the absence of the earlier visual component, N170 was identified (Jacques and Rossion 2006) and the its amplitude was more closely related to high-level processes than low-level visual features of stimuli in face perception (Rossion and Jacques 2008), but in this study, N170 was not identified. (3) The amplitude of N-ERP250 was smaller than those of most previous N170s. (4) In the right hemisphere, the amplitude of N-ERP250 was smaller in the temporal than occipital area, whereas that of N170 was larger in the temporal than occipital area. (5) The source of N-ERP250 was located in the bilateral IT areas whereas sources of N170 were located in higher visual areas, for example, STS and IT areas (Watanabe et al. 2003; Itier and Taylor 2004c).

When using conventional ERP methods, a clear negative component peaking at about 250 ms, N250, is also identified following N170. In previous studies, N250 was sensitive to immediate face repetition (Schweinberger et al. 2002), face identity (Tanaka et al. 2006) and face inversion (Itier and Taylor 2004b). The N-ERP250 in this study showed several similar findings to N250: (1) The sources of these components are the same: temporal ventral areas (Itier and Taylor 2004b; Schweinberger et al. 2002), probably the fusiform gyrus. (2) N250 was observed at the PO9/PO10 and TP9/TP10 electrodes (Itier and Taylor 2004b); in the present study, NERP-250 was measured at parietal, occipital and temporal electrodes.

In conclusion, there are three possible mechanisms generating N-ERP250. The first hypothesis is that N-ERP250 in this study is identical with the N250 of conventional face-related ERP given its latency. In this case, N170 may be absent because face holistic processing, which N170 reflects, is suppressed when using RDB without any changes in luminance.

The second hypothesis is that N-ERP250 is identical to N170, but its latency was delayed due to our unique method.

The third hypothesis is that N-ERP250 does not reflect either N170 or N250, in other words, it is a new component generated by RDB because amplitude and latency of N-ERP250 were much different from those of N170/N250.

The amplitude of N-ERP250 was significantly larger for Scrambled than for Upright and Inverted in the right parietal area and than for Upright in the bilateral occipital areas. These results showed that distortion in the spatial relation between the contour and features might cause additional processing of face perception, i.e., processing of the second-order relations which refer to the spatial distance among internal features (Maurer et al. 2002), and the increase in the amplitude of N-ERP250 in this study reflected such processing, although it is difficult to confirm this important point.

This is the first report of an ERP component evoked using RDB, N-ERP250, reflecting only the activity in the IT area modulated by not only face configural distortion but also the distortion in the spatial relation between the contour and features, different from N170.