Limited evidence of individual differences in holistic processing in different versions of the part-whole paradigm

Sunday, Mackenzie A.; Richler, Jennifer J.; Gauthier, Isabel

doi:10.3758/s13414-017-1311-z

Limited evidence of individual differences in holistic processing in different versions of the part-whole paradigm

Published: 30 March 2017

Volume 79, pages 1453–1465, (2017)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Limited evidence of individual differences in holistic processing in different versions of the part-whole paradigm

Download PDF

Mackenzie A. Sunday¹,
Jennifer J. Richler¹ &
Isabel Gauthier¹

2523 Accesses
13 Citations
1 Altmetric
Explore all metrics

Abstract

The part-whole paradigm was one of the first measures of holistic processing and it has been used to address several topics in face recognition, including its development, other-race effects, and more recently, whether holistic processing is correlated with face recognition ability. However the task was not designed to measure individual differences and it has produced measurements with low reliability. We created a new holistic processing test designed to measure individual differences based on the part-whole paradigm, the Vanderbilt Part Whole Test (VPWT). Measurements in the part and whole conditions were reliable, but, surprisingly, there was no evidence for reliable individual differences in the part-whole index (how well a person can take advantage of a face part presented within a whole face context compared to the part presented without a whole face) because part and whole conditions were strongly correlated. The same result was obtained in a version of the original part-whole task that was modified to increase its reliability. Controlling for object recognition ability, we found that variance in the whole condition does not predict any additional variance in face recognition over what is already predicted by performance in the part condition.

Reliability of composite-task measurements of holistic face processing

Article 25 June 2014

Different measures of holistic face processing tap into distinct but partially overlapping mechanisms

Article 27 June 2021

Global precedence effects account for individual differences in both face and object recognition performance

Article 20 March 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Several phenomena suggest that faces are processed as singular units rather than by individual facial features. Such holistic processing is considered a hallmark of face perception (e.g., Maurer, Le Grand, & Mondloch, 2002; Tanaka & Farah, 1993; but see Donnelly, Cornes, & Menneer, 2012; Fitousi, 2015), and yet its role in face recognition is still not fully understood. In particular, one assumption is that holistic processing contributes to the efficacy of face recognition (Richler & Gauthier, 2014; Rossion, 2013), such that those who excel at face recognition should rely more heavily on this kind of processing. Interestingly, the best evidence linking recognition ability to holistic processing is obtained with non-face objects, whereby holistic processing increases with perceptual expertise (Chua, Richler, & Gauthier, 2015; Gauthier, Williams, Tarr, & Tanaka, 1998; Wong, Palmeri, & Gauthier, 2009). These studies with objects used a holistic processing measure, the composite paradigm, which defines holistic processing as a failure of selective attention to parts (i.e., participants are instructed to attend to only part of the object but are unable to do so and consequently are influenced by the other parts). When this paradigm is used with faces, holistic processing is detected with a large effect size, compared to negligible effects for objects in novices (Richler & Gauthier, 2014). However, in terms of the relation between holistic effects and face recognition ability, when confounds from stimulus repetition in the holistic processing measure are removed, holistic processing of faces does not predict face recognition ability (Richler, Floyd, & Gauthier, 2015), though there is some evidence of impaired holistic processing in individuals with congenital prosospagnosia (Avidan, Tanzer, & Behrmann, 2011; Carbon, Grüter, Weber, & Lueschow, 2007).

Nevertheless, because holistic processing is a central construct in the study of face recognition (e.g., Maurer & Young, 1983; Richler, Palmeri, & Gauthier, 2012), and because it is associated with non-face expertise (Boggan, Bartlett, & Krawczyk, 2012; Bukach, Philips, & Gauthier, 2010; Busey & Parada, 2010), it is difficult to abandon the idea that people who are better at face recognition process faces more holistically. So far, only one measure of holistic processing (the composite task, Richler, Floyd, & Gauthier, 2014) has been adapted for the study of individual differences. Because research on individual differences benefits from measuring abilities as latent variables that capture the shared variance among multiple indicators of a construct (Bollen, 2002), it is important to provide other ways to reliably measure individual differences in face-specific holistic processing. Therefore, here we aimed to adapt another paradigm, the part-whole task, for this purpose. The underlying motivation was to ask whether individual differences in holistic processing as measured by a different paradigm might predict face recognition ability. As we discuss below, there is no clear empirical evidence that the part-whole task taps into the same meaning of holistic processing as the composite task (Richler et al., 2012), and so their respective relations to face recognition ability could differ.

Because research on holistic processing has historically focused on group-level effects, many of these tests excel at capturing group effects (Richler & Gauthier, 2014), but lack the reliability necessary to measure individual differences (e.g., DeGutis, Wilmer, Mercado, & Cohan, 2013; Ross, Richler, & Gauthier, 2015). Individual differences work requires tests that produce reliable scores, which at a minimum means that the tests have good internal consistency. Put simply, a measure that does not correlate with itself cannot be expected to correlate with other measures. Currently, only one holistic processing test, the Vanderbilt Holistic Processing Test – Face (VHPT-F; Richler et al., 2014), was designed specifically to measure individual differences. Although far from perfectly reliable, it has produced more reliable measurements (~.6) than the task from experimental studies on which it is based (~.2–.4, DeGutis, Wilmer, et al., 2013; Ross et al., 2015). A high score on the VHPT-F reflects an inability to selectively attend to single face parts that are clearly identified on each trial. Despite reliable variability between individuals in their ability to selectively attend to face parts on this test, this variability appears to be unrelated to face recognition, at least as measured by the Cambridge Face Memory Test (CFMT; Duchaine & Nakayama, 2006), a popular measure of face recognition ability (Richler et al., 2015).

Although the definition of holistic processing targeted by the VHPT-F is not related to CFMT performance, other definitions of holistic processing could be. For instance, an individual’s ability to use information from a whole face when it is available may be more relevant to face recognition than an individual’s ability to selectively attend to face parts. In the part-whole paradigm (Tanaka & Farah, 1993), face parts are presented alone or in the context of complementing face parts (i.e., in a whole face) that do not add diagnostic information. The rationale behind this paradigm is that the whole context changes the manner in which parts are encoded, such that a holistic process is only engaged when face parts are shown in a whole face context (DeGutis, Mercado, Wilmer, & Rosenblatt, 2013). Although previous studies have examined how part-whole and face recognition measures relate, the part-whole measurements had relatively low reliabilities, limiting correlations with other measures (e.g., λ₂ = .31, DeGutis, Wilmer, et al., 2013; λ₂ = .33, DeGutis, Mercado, et al., 2013). In addition, in these studies a small number of faces were repeated in both the face recognition task and the part-whole task, which can inflate correlations between tasks (Richler et al., 2015). Therefore, to investigate if face recognition ability and the part-whole effect relate, we first needed to create a new version of the part-whole paradigm that produces reliable measurements and has limited stimulus repetition.

Overview

In Study 1, we created a new part-whole test designed to reliably measure individual differences in holistic processing (the Vanderbilt Part Whole Test, VPWT). In Studies 2 and 3 we test if one aspect of our new task, placing isolated parts in the context of phase-scrambled noise, produces different results from the more traditional presentation of face parts in isolation, and compare the VPWT to the original part-whole paradigm to ensure that our results are not test specific. Finally, in Study 4 we examine how VPWT performance relates measures of face (CFMT) and object (Vanderbilt Expertise Test; VET) recognition abilities.

Study 1

We made a number of modifications to the original part-whole paradigm in attempts to maximize its validity and reliability,^{Footnote 1} which are discussed below.

Methods

Participants

Several pilot tests were run on Amazon Mechanical Turk to develop the VPWT 1.0 (N = 20–137 per pilot test, total N = 568) and participants were paid between US$0.45 and US$0.65 for completing one test. When used properly, online crowdsourcing tools like Amazon Mechanical Turk can provide high-quality data (Buhrmester, Kwang, & Gosling, 2011; Hauser & Schwarz, 2016; Mason & Suri, 2012; Paolacci, Chandler, & Ipeirotis, 2010), and thus are a good choice for task development and piloting given the ease and speed at which data can be collected. Furthermore, one study has reported only a negligible difference between online and lab participant pools for individual differences measurements (Cho et al., 2015).

Data for the VPWT 1.0 were collected as part of a larger dataset that also included the Vanderbilt Face Matching Test, and VHPT-F. Only the data from the VPWT 1.0 are reported here. One hundred and sixty-four participants were recruited from Amazon Mechanical Turk to complete the Vanderbilt Face Matching Test. We contacted these participants one day after they completed the Vanderbilt Face Matching Test (VFMT) to offer them the opportunity to complete the VPWT and VHPT-F. Participants were compensated US$0.85 for completing the VFMT, and given an additional bonus of US$2.50 if they completed both the VPWT and VHPT-F. Of the 104 participants who completed the VPWT (39 males; mean age = 40.12 years, age range = 19–76 years), 81.7% were Caucasian, 8.7% were Asian, 5.8% were Hispanic/Latino, 2.9% were African-American, and 1.0% were Native American.

Stimuli

The VPWT used 500 images taken from the internet of forward-facing, unfamiliar, Caucasian faces from 400 identities (200 male, 200 female) that differed in lighting. Since the target identity required two images of the same identity, each trial needed five images total of four identities (four unique identities plus one target identity with two images). Using Adobe Photoshop, the faces were converted to grayscale and cropped to exclude the area from the ears outward. On trials in which the target part included the top portion of the face, the entire background, including external facial features, was removed to prevent use of non-internal face information. On part trials, non-target face portions were phase-scrambled. Each face was assigned to a part-size condition: top two-thirds, bottom two-thirds, top one-third, bottom one-third, top half, bottom half, eyes, nose, and mouth (Fig. 1). The part was then combined with a complementing face portion (either real or a phase-scrambled) to create a complete face. The target part was outlined in red (1.5-pt thick) to indicate the target part. There was approximately an equal number of trials for each face size, part size, and whole versus part conditions. The same complementing face portion was used in the study face and all three test faces (both target and distractors), such that it was not diagnostic. This method of combining face parts is used in the VHPT, a version of the composite task designed to measure individual differences that generally reveals large effect sizes for holistic processing (Richler et al., 2014). In addition, large holistic processing effects have been obtained with similar composite faces made of two different face halves (Richler & Gauthier, 2014). Different images of the target person were presented at study and test to prevent image matching.

Procedure

The test began with instructions and practice trials (pilot 1: two cartoon, one famous face; pilot 2: two cartoon, five famous faces), followed by 81 (pilot 1) or 100 (pilot 2) experimental trials. Participants were instructed to “just try to memorize the red highlighted part” of each face. This is different from the original part-whole task, in which participants are instructed to “pay attention to the entire target face.” Therefore the new task requires selective attention at encoding, a choice we made for two reasons. First, we wanted to ensure that we measured a default holistic advantage in encoding parts in the context of a whole face (i.e., that people cannot help but encode faces holistically, rather than doing so only when the task encourages them to do so). Second, the composite task requires selective attention at encoding and produces large holistic effects on average (Richler & Gauthier, 2014) and reliable individual differences (Richler et al., 2014), so we expected that the selective attention instructions would yield similar holistic effects in the present task (at least if the two paradigms tap into the same underlying mechanism). In addition, pilot testing in a version in which the red box was not present suggested that instructing participants to memorize the entire face, rather than asking them to selectively encode a part, did not affect the results.^{Footnote 2} On the present version, on each trial, a study face was shown for 2 s (Fig. 2). Next, participants made an un-speeded three-alternative forced choice about which option contained the target face part that matched the identity of the study part. Response selection was not speeded to limit response bias (Richler, Mack, Gauthier, & Palmeri, 2009). Participants were given feedback during all practice trials and the first three experimental trials. Trials were blocked by target part, and presented in order of decreasing target part size (top two-thirds, bottom two-thirds, top half, bottom half, top third, bottom third, eyes, nose, mouth). Part, whole, male, and female face trials were randomized. Each pilot test and the final version took approximately 15 min to complete.

The VPWT 1.0, had 100 trials and was created based on iterative item analysis of pilot results with a larger sets of trials. As part of this process, we selected trials with a range of difficulty and on the basis of their correlation with their own condition average, and matched the difficulty of target face parts used in the whole and part conditions.

Modifications from original task

To improve the test’s reliability, we used a 3-AFC format that reduces the guessing rate. In addition, unlike the original part-whole task that only used the eyes, nose, and mouth as target parts, in the VPWT, we varied the size of the target parts (Fig. 1) to create trials that vary in the extent to which they could benefit from the presence of the non-diagnostic rest of the face. Face size also varied (.59, 1.01, or 1.59 in. in face width). Size modulates holistic processing (McKone, 2009; Ross & Gauthier, 2015), so in principle varying size should increase the ability to discriminate between participants along a broader range of abilities (Richler et al., 2014). Importantly, these aspects of the task are not factors of interest, but were varied to help provide discriminating information along the whole continuum of holistic processing. This is analogous to using questions with a broad range of difficulty in an intelligence test. If all questions were of equal difficulty, the test would only differentiate between those who can correctly answer questions at that difficulty level and those who cannot. By using questions from a range of difficulty levels, the test is better able to discriminate between individuals at all levels of ability. Similarly, we did not intend for face and part size factors to be analyzed and interpreted in our test (see Richler et al., 2014, for a similar strategy in the modification of the composite task).

To improve the validity of the VPWT, we added phase-scrambled information to the part condition such that part trials were presented with a phase-scrambled face in the complementing portion (see Fig. 2). We added this phase-scrambled information to keep the spatial frequency properties of part and whole trials similar (in Study 2 we verify that it had no other confounding effects). In addition, the target study and response images were different images of the same identity, so that image matching was not possible.

In the original part-whole task, participants study a whole face and then are tested on recognition of either a face part presented in isolation or within a non-diagnostic face. However, a previous study reported large study-test congruency effects with the part-whole paradigm, such that when participants studied a part, they performed better in the part versus whole condition (Leder & Carbon, 2005). Here, we matched study and test format to exclude the possibility of a context-dependent advantage, which is not the construct of interest. In this way, we aimed to ensure that our test measures how well an individual’s memory of a face part is improved when that part is presented within the context of a complete face, rather than how sensitive an individual is to study-test congruency.

Because we matched study and test format, participants know as soon as they see the study face which part is relevant for the entire trial (i.e., there is no study-test incompatibility). Moreover, the target face part is outlined in red, ensuring that participants know which part will be tested as soon as they see the study face. If encoding whole faces provides an advantage and holistic encoding is under top-down control, this would encourage participants to process study faces holistically when possible. If holistic encoding is not under top-down control, it will occur automatically when whole faces, but not parts, are studied.

Because we matched the study and test format on our test, we did not expect the group level whole advantage to be large (as shown in several experiments by Leder & Carbon, 2005, when there is no contribution of study-test format incompatibility). Because we were interested in how individuals perform relative to others, the test’s validity would not be threatened if we did not find that participants were on average more accurate on whole versus part trials, as we are concerned here with the variability between individuals in their ability to use a whole face when possible.

Results

In the part-whole paradigm, variation on part trials is assumed to reflect general visual perception and face part processing abilities, whereas variation on whole trials reflects those processes as well as an additional holistic process (DeGutis, Wilmer, et al., 2013). Accordingly, holistic processing in the VPWT is operationalized by the variability on whole trials with variability on part trials regressed out (see DeGutis, Wilmer, et al., 2013). We used Guttman’s λ₂ instead of Cronbach’s α to calculate internal consistency of this holistic processing index because the VPWT has multiple conditions and calculation of Guttman’s λ₂ incorporates the covariance between items (Guttman, 1945). Guttman’s λ₂ (based on the formula in Malgady & Colon-Malgady, 1991) was .47 and .43 in Pilot 1 and Pilot 2, respectively.

Accuracy was not significantly greater on whole than part trials (part: M = 56.6%, SD = 10.7%; whole: M = 57.6%, SD = 9.9%; t(206) = 1.03; p = .307, d = .10), indicating no whole-advantage at the group level. Although whole and part trials had moderate internal consistency (whole α = .59, part α = .64), reliability of the holistic processing index (variance in whole trials after regressing out variance in part trials) was only .16 (λ₂). The holistic processing index had much lower internal consistency than either of the two conditions because the correlation between the part and whole conditions was essentially as high as their respective reliabilities (r ₁₀₄ = .60, p < .001, r _corr = .98, R ² = .36).

One concern is that trials with large target face parts (halves and two-thirds) may be responsible for the strong correlation between part and whole conditions, because larger face parts more closely resemble the whole condition. To see if this was the case, trials were grouped by target part size (small parts: eyes, nose, and mouth; medium parts: halves and thirds; large parts: two-thirds). The correlations between part and whole conditions were r ₁₀₄ = .44 (95% confidence interval (CI) [.27–.58], p < .001, r _corr = 1.11), r ₁₀₄ = .40 (95% CI [.22–.55], p < .001, r _corr = 0.99), and r ₁₀₄ = .35 (95% CI [.17–.51], p < .001, r _corr = 1.15) for large, medium, and small parts, respectively. Thus, regardless of part size, performance in the part condition almost fully accounts for performance in the whole condition when measurement error is considered. While the shared variance is numerically smaller for the smaller parts, this condition was the least reliable (Cronbach’s α used here for individual conditions, small part α = .38, whole α = .24; medium part α = .41, whole α = .40; large part α = .36, whole α = .44).

Discussion

We attempted to create a reliable version of a modified part-whole paradigm. Although the test did not achieve sufficient reliability, the reason for this failure is interesting. The part and whole conditions are each fairly reliable independently, but are as correlated as possible given their respective reliabilities, yielding a disattenuated correlation of r _corr = .98 (Wetcher-Hendricks, 2006). Thus, apart from measurement error, there is a near perfect correlation between the two conditions. One limitation of the VPWT 1.0 is that while the difficulty of specific parts was matched across the two conditions, the parts used were different to limit part repetition. However, versions of the original part-whole task in which the same parts were used in both conditions have produced very similar results, with part and whole conditions that were moderately reliable but lower reliability for the holistic regression index (DeGutis, Wilmer, et al., 2013). Nonetheless, the use of different parts in the two conditions also confounds any interpretation of the (absent) group-level whole advantage.

Our results suggest that performance on whole trials can essentially be perfectly predicted by performance with parts. This is very different from results in the composite paradigm, where the shared variance between critical conditions is only about 6% (Richler & Gauthier, 2014). Of course it is possible that one of the ways in which we modified the paradigm challenged its validity. To investigate this, we replicated Study 1 without the phase-scrambled noise on part trials (Study 2), when the same parts were used in whole and part conditions to equate difficulty (Studies 3 and 4), and using a version of the original part-whole paradigm (Study 3).

Study 2

One way in which the VPWT differs from the typical part-whole paradigm is the use of phase-scrambled noise in the parts condition. Although the whole advantage is thought to arise from facilitation of part memory when the target part is presented within a whole face context, it is possible that participants were also able to process the part in the phase-scrambled noise context holistically. For example, participants could have interpreted the noise parts as a disguise, or perceived face parts in the noise itself. We tested this using a version of the VPWT in which parts were presented in isolation on a white background (i.e., without phase-scrambled noise), similar to the original task.