Introduction

The idea of holistic processing, that certain visual stimuli are represented in terms of their undifferentiated wholes rather than via decomposition into parts, has proven highly influential to our understanding of visual object recognition. Originally associated with faces, holistic processing has since been reported for other categories of perceptual expertise (Bukach & Peissig, 2009; but see also Robbins & McKone, 2007). Further, holistic processing effects are stronger for face categories that have acquired familiarity through experience, including race (Rhodes, Locke, Ewing, & Evangelista, 2009), age (de Heering, Houthuys, & Rossion, 2007), and personal significance (e.g., celebrities; Harris and Aguirre (2008a)).

Although these data collectively shed light on holistic processing in the visual system, the emphasis on faces as a model stimulus class has limited our understanding of the nature of holistic representation. Unlike many other visual stimuli, faces are highly constrained in their spatial configuration and relatively sparse in terms of levels of part decomposition. While the basic parts of the face (eyes, nose, and mouth) have social relevance both for identification and emotion recognition, subcomponents of these parts (e.g., eyelashes, nostrils) have less functional significance (though see Sadr, Jarudi, & Sinha, 2003). Therefore, how the fine-grained organization of parts is encoded into holistic templates for visual recognition remains an open question.

Body postures present an excellent test case for this issue, as bodies share many of the ecologically relevant qualities of faces (Slaughter, Stone, & Reed, 2004). Alone or in combination with the face, the body can provide strong social cues as to the identity, emotions, and intentions of another. Thus, to the same extent that naïve observers are “face experts,” the majority of individuals are highly practiced at perceiving bodies as well (Reed & McIntosh, 2013; Reed, Nyberg, & Grubb, 2012). Consistent with these observations, neuroimaging studies have identified body-selective neural responses throughout the visual processing stream (Peelen & Downing, 2007).

Like faces, body postures appear to elicit configural or holistic processing at the level of the basic body structure. Similar to the finding that faces are recognized faster and more accurately when presented upright rather than inverted (Yin, 1969), a “body inversion effect” has been documented for postures (Reed, Stone, Bozova, & Tanaka, 2003; Reed, Stone, Grubb, & McGoldrick, 2006; Stein, Sterzer, & Peelen, 2012). If, as is commonly proposed, recognition of inverted faces relies on a separate part-based visual processing system (Maurer, Le Grand, & Mondloch, 2002), then by the same logic upright bodies may undergo less part-decomposition than other non-face objects. Correspondingly, a “whole-versus-part superiority effect” (Tanaka & Farah, 1993) has been documented for bodies as well as faces: recognition accuracy for body parts (e.g., “Tom’s arm”) is improved by presentation in the context of a novel whole body versus in isolation (McGoldrick, 2004; Seitz, 2002).

Another measure of holistic face processing, the composite effect paradigm (Schiltz & Rossion, 2006; Young, Hellawell, & Hay, 1987), has been applied to bodies, but the results have been mixed. When the task is posture recognition, “composite” stimuli made by combining halves of two body postures along the horizontal axis produce a standard “composite effect”: identifying the posture of one half, participants show impaired performance when the halves are aligned versus misaligned (Willems, Vrancken, Germeys, & Verfaillie, 2014). However, when the task is person identity, the composite effect was found for bodies divided along the vertical axis, but was substantially reduced when the body was divided along its horizontal axis (Bauser, Suchan, & Daum, 2011; Robbins & Coltheart, 2012). Similar results were found from the inversion paradigm (Reed et al., 2006), in which inversion effects were only seen for body halves divided along the vertical but not the horizontal axis. These data suggest that the bilateral symmetry of the body’s structure might allow the visual system to holistically complete partial body stimuli around the vertical axis.

Thus, evidence across multiple paradigms indicates that bodies, like faces, may be processed in a holistic manner. Yet, in contrast to faces, bodies may be represented at multiple levels: not only the gross organization of the basic body structure, but also the finer-scale arrangement of the body’s constituent parts and sub-parts. For example, the arm not only forms a component of the body schema, but is also itself composed of parts with a well-defined spatial configuration: fingers, hand, wrist, forearm, elbow, etc.

As demonstrated by the rich and varied space of hand gestures available for non-verbal communication (Krauss, Chen, & Chawla, 1996), these parts and their arrangement may have ecological relevance for social interaction beyond what is conveyed in large-scale body posture. Likewise, neural responses have been reported throughout the visual processing stream not only for whole bodies (Oram & Perrett, 1996), but also for body parts in general (Downing & Peelen, 2011, 2015), as well as specific body parts such as hands (Bracci & Peelen, 2013; Gross, Rocha-Miranda, & Bender, 1972). At the conceptual level, a distinction among body parts can be seen in categorization studies, which find body parts tend to be conceptually grouped by their functional capabilities rather than their physical divisions (Reed, McGoldrick, Shackelford, & Fidopiastis, 2004). Body part functions often coordinate multiple body parts to be able to perform larger scale movements. For example, fingers, hands, forearms, and upper arms are often grouped so that the hand can perform reaching and grasping movements.

Thus, if holistic processing is associated with perceptual expertise and/or ecological relevance, then there is good reason to suspect that bodies may be represented holistically not only in their basic layout but also at a finer subscale. To examine this question, we employed a novel combination of the part-whole paradigm with stimuli modified via a stereoscopic depth manipulation (Harris & Aguirre, 2008a, b; Nakayama, Shimojo, & Silverman, 1989). In a two-alternative-forced-choice paradigm, participants determined which of two images contained a specified body part; some of the images displayed parts in isolation and some of the images displayed parts in the context of a whole body. By combining the visual stimulus of interest – body posture – with different binocular disparity cues, we produced two strikingly different percepts: (1) a body occluded behind a set of bars (Fig. 1, left), or (2) strips of a body floating in front of a background (Fig. 1, right). Critically, aside from the depth manipulation, these stimuli are identical in terms of their low-level stimulus properties; however, while the first case undergoes amodal completion and can be processed holistically, the latter cannot be completed and is therefore perceived in terms of its constituent parts.

Fig. 1
figure 1

Experimental paradigm. In the study phase participants viewed whole body postures. In the test phase, a 3D overlay was superimposed on the stimuli and participants recognized a body part configuration or pose either within the context of a neutral posture whole body or in isolation

Using this manipulation with faces, Harris and Aguirre (2008a) demonstrated a “whole-versus-part” superiority effect (Tanaka & Farah, 1993) for the same whole faces depending on whether they were perceived as being behind occluding bars, and thus amodally completed, or as floating in strips. This effect was similar in size to the advantage for whole faces versus isolated face parts, analogous to previous reports; however, recognition for parts presented in isolation was unaffected by the depth manipulation. Extending this approach to body postures provides a more definite test of holistic processing for this spatially variable visual category, both at the basic level and for organization within parts. The stereoscopic manipulation provides a powerful means of testing holistic versus part-based processing for body postures without changes in the low-level physical properties of the visual stimulus. In addition, this paradigm allows us to look at multiple levels of body posture representation: that is, not just at the basic body structure but also at the within-part (arm, leg, etc.) organization.

As shown in Fig. 1, novel body postures were presented with binocular disparity cues that either enabled amodal completion behind occluding bars, or created the percept of body “strips” floating in front of a background. Performance for the two depth manipulations was measured not only for bodies but also for isolated body parts. This allowed us to examine whether bodies are processed holistically in terms of the basic body structure, as well as whether this holistic representation extends to smaller-scale body parts. Thus, recognition accuracy should be reduced for whole bodies when presented in front of, as opposed to behind, the bars indicated by binocular disparity. However, if body part configurations are represented holistically, we predict there should be additional reductions in accuracy for isolated parts depending on perceived stereoscopic depth. Because low-level visual properties are equated between our stimuli, poorer performance for isolated body parts floating in front of a background would be most consistent with holistic templates at a finer scale for individual parts. To ensure generalizability of our results, we performed two separate experiments employing this paradigm with different sets of body posture stimuli (Fig. 2), one of which had previously been shown to elicit a “whole-versus-part superiority effect” (McGoldrick, 2004).

Fig. 2
figure 2

Example body posture stimuli from Experiments 1 (a) and 2 (b)

Methods

Subjects

Inclusion criteria consisted of normal or corrected vision and the ability to discriminate the stereoscopic depth conditions, as demonstrated by indicating which stripes were in front versus in back for two sample images. Thirty-three participants (18 females, 18–23 years) in Experiment 1 (89 % of those tested) and 31 participants (13 females, 18–21 years) in Experiment 2 (100 % tested) met these criteria and volunteered in exchange for partial course credit. One participant in Experiment 1 was excluded from the analysis because his performance was at or below chance in more than one condition. All procedures were approved by Claremont McKenna College’s institutional review board.

Stimuli

In Experiment 1, six images of androgynous/male body poses were created in the Poser 7 3D modeling software (http://poser.smithmicro.com/). Each image was assigned a three-letter name label (e.g. Bob, Jim, Tom). Postures were configured so that they were abstract body poses that could not be easily described (e.g., salute) and were not iconic poses (e.g., “The Heisman”). In Experiment 2, six additional poses were selected from a stimulus set previously documented to produce the “whole-versus-part superiority effect” in bodies (McGoldrick, 2004). Each pose had a unique arrangement for arm, leg, and torso position (Fig. 2). Pose height, weight, and image zoom were constant across all images. The pose images were centered on a 250 × 250 pixel gray square. Poses were converted to grayscale to be visually separable from the 3D stripe overlay.

For test stimuli, a single body part (arm, leg, or torso) from the study poses was placed either in isolation or in the context of a whole body; a single foil was created for each part/pose by selecting a single body part from one of the other postures. These stimuli were placed into an image consisting of a gray background noise pattern overlaid with noise-patterned red and green stripes positioned to appear at 5 min or 9 min of disparity either in front of or behind the body stimulus when viewed with anaglyphic (red/green) glasses. The finished stimuli were 288 × 288 pixels, subtending 18.1° × 18.1° of visual angle, and were presented on a black background. As in Nakayama et al. (1989), this 3D stripe overlay was used to create a percept of a body behind a set of stripes (Fig. 1a) or strips of a body floating in front of a pixelated background (Fig. 1b). In the former condition, the visual system should fill in the stripes and the image will be perceived as a typical body posture or part occluded by bars. In the latter condition, the image cannot be completed amodally and thus is perceived in terms of its constituent parts.

Procedure

The experimental procedure was modified from Tanaka and Farah (1993) and Harris and Aguirre (2008a). This task had two parts: a study phase and a testing phase. During the study phase participants learned name-body pose associations. They were presented with six target body poses without disparity cues. The images were of the same body but differed in the positions of the arms, legs, and torso (Fig. 1). During the study phase, each pose was presented five times each in random order. Participants studied each pose for a minimum of 5 s.

Next, participants performed a two-alternative-forced-choice recognition test in which they determined which of two images was the target. The target and foil differed with respect to the part being tested. For test conditions, the two images could be an isolated body part (arm, leg, or torso) or that body part in the context of a whole body (Fig. 1). In addition, the parts and poses were superimposed with either front or back disparity depth conditions. The front disparity/part condition induced the perception that the pixelated bars were the background behind the images and that the images were bands of smaller parts. The back disparity/whole condition induced the perception that the bars floated in front of the images and that the images were complete behind obscuring pixelated bars. In each session, half of the correct responses were on the left and half were on the right. The duration of the experiment was approximately 20 min. Trials were blocked by depth condition for a total of 72 trials.

Results

To assess the “whole-versus-part superiority effect,” holistic processing was measured in terms of mean accuracy for each participant and condition. Proportion accuracy data were entered into a repeated measures Condition (2: whole/part) × Depth (2: Back/Front) analysis of variance (ANOVA) (Fig. 3). In Experiment 1, the discrepancy between one arm/foil pair was obscured by the stereopsis bars, so four trials were eliminated from the analysis.

Fig. 3
figure 3

Proportion accuracy data for Experiments 1 (a: n = 32) and 2 (b: n = 31) as a function of the stereo manipulation (body perceived as being in front or back of the 3D stripes) and context (part recognition as an isolated part or in the context of a whole body). Error bars represent standard error

Consistent with previous reports of whole-versus-part superiority, performance in Experiment 1 was higher for body part recognition when targets were presented in the context of the whole body (mean = 0.73; se = 0.02) rather than in isolation (mean = 0.69; se = 0.02) (Condition: F(1, 31) = 4.65, p = 0.039, Ŋp 2 = 0.13). Additionally, accuracy was significantly worse for both types of stimuli when they were perceived as in front (mean = 0.69; se = 0.01) rather than in back (mean = 0.73; se = 0.02), with respect to stereoscopic depth (Depth: F(1, 31) = 5.70, p = 0.023, Ŋp 2 = 0.16). Consistent with the idea that the stimulus perceived as floating strips cannot be completed and thus is processed in a part-based fashion, performance for whole bodies in the frontal depth plane was not significantly different from that for isolated body parts perceived as behind occluding stripes (t(31) < 1, ns).

Experiment 2 replicated these results with a different stimulus set. Performance was higher for body part recognition in the context of the whole body (mean = 0.75; se = 0.02) rather than in isolation (mean = 0.70; se = 0.02) (Condition: F(1, 30) = 7.84, p = 0.01, Ŋp 2 = 0.21). Accuracy was worse for whole and part stimuli when they were seen as in front (mean = 0.71; se = 0.01), rather than in back (mean = 0.74; se = 0.01), with respect to stereoscopic depth (Depth: F(1, 30) = 4.30, p = 0.047, Ŋp 2 = 0.13). Finally, performance for whole bodies in the frontal depth plane was not significantly different from that for isolated body parts perceived as behind occluding stripes (t(30) < 1, ns).

These findings support the idea that whole bodies are represented in a holistic manner, and that the manipulation of stereoscopic depth can disrupt this processing. However, a further question is whether the binocular disparity manipulation also affects recognition of body parts in isolation. Critically, the Condition × Depth interaction was not significant in either Experiment 1 (F(1, 31) < 1, ns) or Experiment 2 (F(1, 30) < 1, ns). Unlike previous research for faces, which found no difference in the recognition of parts depending on depth (Harris & Aguirre, 2008a), in both experiments performance for isolated body parts was poorest when they were presented as stripes in front. This finding cannot be attributed to differences in low-level stimulus properties, suggesting a perceptual advantage even for isolated parts when they can be amodally completed.

Discussion

Although faces have been used extensively to explore holistic processing in high-level vision, this visual category is distinct in terms of both the degree of anatomic constraint of the parts and the level of functional organization for subordinate levels of representation. In these respects, bodies provide an excellent test case: Like faces, they are highly relevant for social communication, but in addition body parts themselves may have more detailed subdivisions such as hand, wrist, and elbow.

We employed a novel approach to investigate the holistic processing of body postures at the whole and part level: a stereoscopic manipulation that created either the percept of a whole body occluded by a set of bars or segments of body floating in front of a background. Despite having identical low-level properties, only the first stimulus should be perceived holistically due to filling-in via amodal completion (Harris & Aguirre, 2008a; Nakayama et al., 1989). Using this manipulation in conjunction with the whole-versus-part superiority paradigm (Tanaka & Farah, 1993) in two separate experiments, we found improved recognition accuracy for postures of body parts when they were presented in the context of whole bodies, rather than in isolation, in line with previous results (McGoldrick, 2004; Seitz, 2002). Additionally, performance was better for stereoscopic conditions where the body is perceived to be intact behind the bars rather than as floating body segments, suggesting that the binocular disparity information affects holistic processing.

In our paradigm, faces were either occluded or removed from the stimuli, yet we nonetheless found evidence of holistic processing for body postures. These results differ from recent work by Brandman and Yovel (2010) suggesting that body inversion effects might be driven by the presence of the face, both behaviorally and at the neural level. However, the link between inversion and holistic processing is debated, with some researchers suggesting that inversion effects may arise from quantitative decreases in processing efficiency rather than qualitative changes in the nature of processing (Sekuler, Gaspar, Gold, & Bennett, 2004).

Most notably, we found no interaction between the depth manipulation and whether the body part to be recognized was presented within a body context or in isolation. This result stands in contrast to Harris and Aguirre’s study (2008a), which revealed a significant two-way interaction of depth and presentation context for faces using an identical paradigm with similar sample size and accuracy rates. They found that when the amodal completion of whole faces was disrupted, performance was similar to that for isolated face parts regardless of depth (approximately 70 % accuracy for both depth conditions). Instead we found no interaction, as there were effects of both the isolated part presentation context and the frontal binocular disparity manipulation, which removed amodal completion. Thus, the pattern of results for bodies is comparable to that for faces in every respect except for the condition of isolated parts floating in the frontal depth plane. Together with recent studies of categorization and performance for divided body halves, these results support the idea that body parts may themselves be subject to perceptual processing beyond basic decomposition into parts.

Given these findings, a further question is that of the extent of holistic processing for visual categories with subscale organization, such as bodies. As mentioned above, the arm can be perceived as part of a larger whole, but also in terms of its constituent parts. Some of these components have further systematic organization within themselves: for example, the hand is a subdivision of the arm, but also includes differentiable parts such as fingers, thumb, and palm. It is not coincidental that the upper body, in particular, has been identified as diagnostic in other studies of body posture (Reed et al., 2006): the arms and hands play an important role in signaling intention and emotion and are associated with varied and dynamic forms of nonverbal communication (Krauss et al., 1996). These results argue for subdivision of visual stimuli not only on the basis of visual part segmentation, but also with respect to the functions associated with different subregions (Reed et al., 2004).

More generally, these results extend our understanding of holistic processing by supporting the idea that such representations can vary in spatial extent. Although previous experiments using faces and bodies have focused on the idea of a large-scale template or gestalt, holistic representations for complex visual stimuli may exist at multiple spatial scales. Further research should be undertaken to disentangle the roles of visual and functional segmentation in holistic processing for complex visual categories.