1 The Visual System

The neurophysiological properties of the visual system have been investigated in detail in animals such as monkeys and cats. Many cortical areas have been identified including the IT (inferior temporal) area, STS (superior temporal sulcus) area, and MT (middle temporal)/V5 area. Previous studies using a noninvasive neuroimaging method revealed that the visual system in humans was similar to that in monkeys [1, 2].

The human visual system consists of two main separate pathways [3, 4]. The first pathway begins from cones in the retina, which process colors and receive information through the primary visual (striate) cortex via the parvocellular layers of the lateral geniculate nucleus (LGN). This pathway is called the “ventral (what) pathway” and processes shapes and objects and plays a role in color perception. On the other hand, the second pathway begins from rods in the retina, which process lightness and receive information through the primary visual area via the magnocellular layers of LGN. This pathway is called “the dorsal (where) pathway” and processes location and motion and also plays a role in three-dimensional shape perception [5].

Magnetoencephalography (MEG) achieves very high temporal and spatial resolution. Previous MEG studies revealed the temporal and spatial characteristics of the human visual system. In MEG studies of the primary visual area, evoked magnetic components were identified by checkerboard pattern reversal stimuli [68]. In a previous MEG study [6], N75m, P100m, and N145m were clearly observed around the midoccipital position, while a very small component, P50m, was occasionally detected prior to N75m. The equivalent current dipole (ECD) of the P50m, N75m, and P100m component was estimated in the striate cortex, while the ECD of N145m was estimated in the extrastriate cortex.

P100m is the most important biomarker for primary visual cortex activity among the components evoked by pattern reversal stimuli. In an MEG study on responses to pattern reversal left half-field central (0–2°, 0–5°) and left half-field peripheral stimulation (2–15°, 5–15°) [7], the P100m dipole to the central field was localized more posterior than that to the peripheral stimulation. These findings were attributed to the retinotopic organization of the visual cortex. P100m is also affected by the properties of stimuli, for example, the check size. Nakamura et al. [8] used half-field stimuli with or without central occlusion with check sizes of 15′, 30′, 60′, 90′, and 180′ of the visual arc and recorded the P100m component. The latencies for the smaller checks were significantly longer than those for the larger checks, while the amplitudes for the smaller checks were significantly smaller than those for the larger checks when using a stimulation with central occlusion, but not that without central occlusion. However, the equivalent current dipoles were located around the calcarine fissure and did not differ significantly in location with the check size.

Previous MEG studies demonstrated that the perception of characters and shapes was related to the ventral pathway, such as the IT area [912]. Koyama et al. [9] recorded MEG in Japanese subjects in order to investigate the recognition of Japanese characters (kanji and kana) and letters of the alphabet. The first evoked component was found to peak after approximately 180 ms under all conditions, and the estimated ECDs were located in the IT area. These findings revealed that the IT area was essential for character discrimination.

Since the activity of the primary visual areas to luminance changes is strong and long-lasting, it is frequently difficult to clearly identify IT area activity, even when using MEG with high spatial resolution. To overcome this issue, we newly developed a random dots blinking (RDB) method that uses temporal changes in the patterns of a large number of small dots to present stimuli without a change in luminance during the presentation of an object (e.g., a circle, a letter, or a schematic face) and, consequently, reduces activities in the primary and other higher visual areas and detects activity in the IT area only [10, 1317]. Okusa et al. [10] investigated the visual recognition of characters using the RDB method. One clear component, with a peak latency of approximately 300 ms, was identified and the discrimination accuracy rate increased as the character display duration became longer. The source of this component was estimated in the IT area. These findings showed that the RDB method was useful for investigating IT area activity in relation to visual perception.

Apparent motion is the perception of the realistic smooth motion of an object that first flashes at one place and then at another. Kaneoke et al. [18] reported the presence of a localized cortical area, corresponding to the human homologue of MT/V5, which was exclusively sensitive to apparent motion stimuli, which was identical to that for smooth motion. Kawakami et al. [19] measured magnetic responses to apparent motion stimuli and compared the findings obtained with a subjective rating of the quality of perceived motion with various stimulus timings. The strength (ECD moment) of the response varied with the stimulus timing, with the maximum value being at the inter-stimulus interval 0 ms, and was related to the subjective rating of the quality of motion. Another MEG study reported that the response to downward motion in the upper visual field was significantly larger than that to upward motion, but not in the lower visual field [20]. These findings indicated that human MT/V5 had directional preference for downward motion versus upward motion in the upper visual field.

2 Face Perception

2.1 Static Face Perception

The face contains a lot of information that is relevant to our daily lives, such as information about age, sex, and familiarity, and plays an important role in social communication. Recent studies investigated static face perception using neuroimaging methods, including electroencephalography (EEG), MEG, and functional magnetic resonance imaging (fMRI). For example, EEG demonstrated that a negative component was evoked at approximately 170 ms during object perception, and this component was termed N170 [21, 22]. N170 was shown to be larger during the viewing of faces than during the observation of other objects, such as cars or chairs [23].

The temporal and spatial processing of face perception in 12 normal subjects was previously investigated by MEG [11, 12]. Five different kinds of stimuli, (1) face with opened eyes, (2) face with closed eyes, (3) eyes, (4) scrambled face, and (5) hand, were used. Subjects were asked to count the hand stimuli and declare the number after each session to avoid attention effects caused by the P300 component on responses to face and eyes stimuli.

Two components, 1M and 2M, were identified in the right and left hemispheres, and the mean peak latencies of the 1M and 2M components were approximately 132 and 179 ms, respectively, (Fig. 7.1). The 1M component to all kinds of this stimulus was recorded in all subjects. This component appeared to be simple phase reversed. The 2M component was clearly identified from the right hemisphere in ten out of the 12 subjects and appeared to overlap with the 1M component in five subjects. The 2M component was clearly observed in response to the face with opened eyes or closed eyes in all ten subjects, but to the eyes in only three subjects with a smaller amplitude. The 2M component was not observed in response to the scrambled face and hand. On the other hand, the 2M component was clearly identified in five and two out of the ten subjects in response to the face (with opened eyes and with closed eyes) and eyes stimuli from the left hemisphere, respectively. The 2M latency to the eyes stimuli was significantly longer than those to the face with opened eyes and face with closed eyes. No significant difference was observed in the 2M latency between the face with opened eyes and face with closed eyes. The source analysis revealed that the 1M component was generated in the primary visual cortex in the bilateral hemispheres, while the 2M component was generated in the inferior temporal cortex around the fusiform gyrus (Fig. 7.2). These findings suggested that (1) the fusiform gyrus played an important role in the holistic and/or configural process of human static face perception and (2) the right hemisphere was more dominant.

Fig. 7.1
figure 1

MEG waveforms of a representative subject in response to five categories of stimuli: face with opened eyes, face with closed eyes, eyes, scrambled face, and hand. MEG waveforms recorded from 37 channels from the right hemisphere were superimposed. Two components, 1M and 2M, were identified. The 1M component was recorded for all stimuli, whereas the 2M component was not recorded for scrambled face or hand. The 2M latency to eyes stimuli was significantly longer than those to face with opened eyes and face with closed eyes. No significant difference was observed in the 2M latency between face with opened eyes and face with closed eyes (Adapted from [11])

Fig. 7.2
figure 2

The ECD of the 2M component to face with opened eyes recorded from the right hemisphere overlapped on two-dimensional and three-dimensional MRI in a representative subject (Adapted from [12])

A phenomenon exists that is unique to humans and nonhuman primates. Psychological studies previously reported that face recognition was more difficult when inverted faces were presented rather than upright faces and named this phenomenon the face inversion effect. These findings suggested that face inversion disrupted the holistic and/or configural processing of facial information [24, 25].

In EEG studies, N170 was shown to be affected by inversion. In previous studies [21, 2628], N170 was longer in latency and larger in amplitude for inverted faces than for upright faces. MEG studies have also examined face inversion effects. Watanabe et al. [29] reported that the latency of the component evoked by inverted faces was longer in the right hemisphere and shorter in the left hemisphere than that evoked by upright faces. In addition, the latency of N170 was longer for scrambled features (e.g., eyes, nose, and mouth), in which the spatial relationship between the facial contours (e.g, the hair, chin, and ears) and features was disrupted [22, 27], than for upright faces. In our MEG study [30], we investigated (a) how brain activity related to static face perception was modulated by inverting the contours and/or features of the human face and (b) what information within the face was important for processing static face perception. We used MEG and compared the cortical activities evoked by viewing an upright face, inverted face, and face in which the spatial relationship between the contours and features was disrupted. We focused on the activity in the fusiform gyrus for static face perception. We studied ten right-handed adults with normal or corrected visual acuity. We used the following three conditions (Fig. 7.3):

Fig. 7.3
figure 3

Examples of stimulus conditions. (1) U&U, upright contour (hair and chin) and upright features (eyes, nose, and mouth); (2) U&I, upright contour and inverted features; and (3) I&I, inverted contour and inverted features (Adapted from [30])

  1. 1.

    U&U (upright contours and upright features): Contours (hair and chin) and features (eyes, nose, and mouth) upright.

  2. 2.

    U&I (upright contours and inverted features): Contours remained upright; however, the features were mirrored and inverted relative to the U&U condition, while the spatial relationship among the eyes, nose, and mouth was not changed.

  3. 3.

    I&I (inverted contours and inverted features): A mirrored and inverted form of the image used in the U&U condition.

Figure 7.4 shows the waveforms recorded from 204 gradiometers of a representative subject following the U&U (upright contour and upright features) condition and the waveforms in all conditions at representative sensors, in which the largest component was identified for the U&U condition in each occipital or temporal area of the same subject. The sources were estimated to lie in the fusiform area from MEG responses under all conditions. The latency of the activity of the fusiform area was significantly longer for U&I (upright contour and inverted features; p < 0.05) and I&I (inverted contour and inverted features; p < 0.05) than for U&U in the right hemisphere and significantly longer for U&I than for U&U (p < 0.01) and I&I (p < 0.05) in the left hemisphere (Table 7.1). No significant differences were observed in the dipole moment (strength) among the three conditions. These results demonstrated the following: in static face perception, the activity of the right fusiform area was affected more by the inversion of features, while that of the left fusiform area was affected more by a disruption in the spatial relationship between facial contours and features.

Fig. 7.4
figure 4

The upper image shows the waveforms recorded from 204 gradiometers of a representative subject following the U&U (upright contour and upright features) condition. The head was viewed from the top. In each response at 102 pairs of gradiometers, the upper response showed the field along the latitude of the gradiometers, while the lower one showed that along the longitude of the gradiometers. The lower image shows enhanced waveforms at sensors A and B in the upper image, which showed the largest component for the U&U condition in each occipital or temporal area, recorded in red for U&U, blue for U&I (upright contour and inverted features), and light blue for I&I (inverted contour and inverted features). (a) Waveforms at sensor A on the left hemisphere in the upper image. (b) Waveforms at sensor B on the right in the upper image. Black arrows show the stimulus onset and white arrows the response chosen for further analyses. Responses after the stimulus onset were clearly longer in latency for U&I and I&I than for U&U in the right hemisphere and were longer in latency for U&I than for U&U and I&I in the left hemisphere for this subject (Adapted from [30])

Table 7.1 Dipole moments (nAm) and peak latencies (ms) for U&U, and U&I, and I&I. Means and standard deviations (SDs) for U&U, U&I, and I&I in right and left hemispheres (Adapted from [30])

2.2 Facial Movement Perception

Facial movements are also useful for social communication in humans. For example, the direction of the eye gaze is used to assess the social attention of others, and it becomes markedly easier to understand speech when the mouth movements of the speaker can be observed. In a previous MEG study [31], a specific region for the perception of eye movements was detected within the occipitotemporal area corresponding to the human MT/V5, and this was different from motion in general. We examined the temporal characteristics of the brain activity elicited by viewing mouth movements (opening and closing) and compared them to those of eye aversion movements and motion in general [32]. Seventeen right-handed adults with normal or corrected visual acuity participated in this study. We used apparent motion, in which the first stimulus (S1) was replaced by a second stimulus (S2) with no inter-stimulus interval as follows:

  1. 1.

    M-OP: The mouth is opening.

  2. 2.

    M-CL: The mouth is closing.

  3. 3.

    EYES: The eyes are averted.

  4. 4.

    RADIAL: A radial stimulus moving inward.

A large clear component, 1M, was elicited by all conditions (M-OP, M-CL, EYES, and RADIAL) within 200 ms of the stimulus onset (Fig. 7.5). Concerning the peak latency of 1M, the means and standard deviations were 159.8 ± 17.3, 161.9 ± 15.0, 161.2 ± 18.9, and 140.1 ± 18.0 ms for M-OP, M-CL, EYES, and RADIAL in the right hemisphere, respectively, and 162.4 ± 11.6, 160.9 ± 9.8, 164.6 ± 14.2, and 138.4 ± 9.0 ms for M-OP, M-CL, EYES, and RADIAL in the left, respectively. The latency for RADIAL was significantly shorter than that for the facial motion conditions (M-OP, M-CL, and EYES) (p < 0.05). No significant differences were observed in 1M latency between M-OP, M-CL, or EYES. In the source analysis, the sources were estimated to lie in the occipitotemporal area, the human MT/V5 area homologue, from 1M under all conditions (Fig. 7.6).

Fig. 7.5
figure 5

Right hemisphere MEG activity shown in a 37-channel superimposed display for all conditions. In a representative subject, 1 M peak latencies were 154.8, 156.7, 162.5, and 148.1 ms for M-OP, M-CL, EYES, and RADIAL, respectively. Associated maximum root mean square (RMS) values were 62.6, 66.7, 122.0, and 119.4 fT(Adapted from [32])

Fig. 7.6
figure 6

Right hemisphere locations for ECDs under the M-OP condition overlaid on axial, coronal, and sagittal MRI slices and the volume-rendered brain under M-OP, M-CL, EYES, and RADIAL conditions of a representative subject (Adapted from [32])

The means and standard deviations of the dipole moment of the estimated dipoles from 1M were 7.9 ± 1.9, 7.8 ± 3.2, 10.0 ± 6.8, and 13.8 ± 4.9 nAm for M-OP, M-CL, EYES, and RADIAL in the right hemisphere, respectively, and 7.4 ± 2.8, 6.7 ± 3.0, 9.3 ± 4.3, and 13.6 ± 1.8 nAm for M-OP, M-CL, EYES, and RADIAL in the left hemisphere, respectively. No significant differences were observed in the dipole moment (strength) for M-OP, M-CL, and EYES between either hemispheres. However, M-OP and M-CL were significantly smaller than RADIAL (p < 0.05) in the right hemisphere, while M-OP, M-CL, and EYES were significantly smaller than RADIAL (p < 0.05) in the left hemisphere.

The results of this study indicated that the occipitotemporal (human MT/V5) area was active in the perception of both mouth and eye movements. Furthermore, viewing mouth and eye movements did not elicit significantly different activities in the occipitotemporal (human MT/V5) area, which suggested that the perception of the movement of facial parts was processed in the same manner, and this is different from motion in general.

However, the main factor(s) causing differences in the recognition of facial versus general movement has yet to be elucidated in detail. Many studies investigated the effects of facial contours and features using a static face. A previous study reported that it took longer to recognize an eyes-only stimulus or only facial features (eyes, nose, and mouth) than a full-face stimulus with contours [11], and the contours of the face were important in static face recognition. However, to the best of our knowledge, the effects of facial contours and features on facial movement recognition have not yet been investigated. Therefore, we examined the effects of facial contours and features on early occipitotemporal activity evoked by facial movement [33]. In this study, we used a schematic face because a simple schematic drawing with a circle for a contour, two dots for eyes, and a straight line for lips was recognized as a face even though each individual component of the drawing by itself was not. Previous studies using a schematic face showed that N170 was evoked by schematic faces as well as photographs of real faces [27, 28].

Thirteen right-handed adults with normal or corrected visual acuity participated in this study. We used apparent motion where the first stimulus, S1, was replaced by a second stimulus, S2, with no inter-stimulus interval and presented the following four conditions (Fig. 7.7). All the subjects described the smooth movement of the eyes or dots with this paradigm:

Fig. 7.7
figure 7

Examples of stimulus conditions. (1) CDL: schematic face consisting of a contour, two dots, and a horizontal line, (2) CD: the contour and two dots, (3) DL: two dots and a horizontal line, and (4) D: two dots only (Adapted from [33])

  1. 1.

    CDL: A schematic face consisting of a facial contour, two dots, and a horizontal line

  2. 2.

    CD: The contour and two dots

  3. 3.

    DL: Two dots and a horizontal line

  4. 4.

    D: Two dots only

The subjects described the simple movement of dots for D, whereas eye movement for CDL, CD, and DL though movement modalities was the same under all conditions. In source modeling, we used a single equivalent current dipole (ECD) model [34] within 145–220 ms of the stimulus onset.

Clear MEG responses were elicited under all conditions (CDL, CD, DL, and D) at the sensors in the bilateral occipitotemporal area (Fig. 7.8). The means and standard deviations of the peak latency of the estimated dipole were 179.3 ± 26.3, 183.0 ± 16.9, 180.9 ± 20.8, and 180.3 ± 23.7 ms for CDL, CD, DL, and D in the right hemisphere, respectively, and 180.2 ± 14.9, 180.5 ± 24.8, 174.0 ± 24.9, and 177.7 ± 20.3 ms for CDL, CD, DL, and D in the left hemisphere, respectively. No significant differences were observed between any of these conditions.

Fig. 7.8
figure 8

Waveforms recorded from 204 gradiometers in a representative subject following S2 onset (eyes movement) under CDL conditions. The head was viewed from the top. The lower image shows waveforms at sensors L and R in the upper image, which showed a clear component in each hemisphere was recorded in red for CDL, blue for CD, light blue for DL, and green for D. L: representative waveforms at sensor L on the left hemisphere of the upper image. R: representative waveforms at sensor R on the right of the upper image. Black arrows are the S2 onset. The main response after the S2 onset, indicated by white arrows, to CDL was larger in amplitude than the others. The polarity of each sensor may have been due to the current of the dipole (Adapted from [33])

The means and standard deviations of the dipole moment were 14.4 ± 6.2, 11.2 ± 7.9, 9.6 ± 6.5, and 10.3 ± 5.3 nAm for CDL, CD, DL, and D in the right hemisphere, respectively, and 12.7 ± 6.7, 11.1 ± 6.1, 9.6 ± 5.4, and 8.9 ± 5.5 nAm for CDL, CD, DL, and D in the left hemisphere, respectively. The moment was significantly larger for CDL than for CD (p < 0.05), DL (p < 0.01), and D (p < 0.01) in the right hemisphere and for CDL than for DL and D (p < 0.01) in the left hemisphere.

Our results in this study demonstrated specific information processing for eye movements, which was different from motion in general, and activity in the occipitotemporal (human MT/V5) area related to this processing was influenced by whether movements appeared with the contours and/or features of the face.