Effects of masculinity vs. femininity on competence judgement of politician faces and election outcome prediction

Cheung, Olivia S.; Jintcharadze, Davit

doi:10.1038/s41598-023-44159-7

Effects of masculinity vs. femininity on competence judgement of politician faces and election outcome prediction

Article
Open access
Published: 06 October 2023

Volume 13, article number 16825, (2023)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Effects of masculinity vs. femininity on competence judgement of politician faces and election outcome prediction

Download PDF

1299 Accesses
Explore all metrics

Abstract

First impressions of politician faces can be effective in predicting election outcomes, based on perceived competence from candidate photographs. However, it remains unclear whether such effects arose from facial features or other non-facial information present in the photographs (e.g. hairstyles, clothes, or poses). In four pre-registered studies, participants completed two tasks in a counter-balanced order: rating competence of individually presented faces and predicting election outcome of each pair of winner and runner-up faces. We examined competence judgment and election outcome prediction on faces from male politicians depicted on original portraits (Experiment 1), or on computer-generated faces with facial features extracted from the portraits (Experiment 2). The faces were then either masculinized or feminized (Experiments 3 and 4). We found that competence ratings were significantly higher for winners than runners-up and that winners were more likely predicted to win the elections than runners-up in all but Experiment 4, where faces of the winners were feminized and faces of the runners-up were masculinized. Regardless of facial feature changes, correlations were found between competence ratings and election outcome prediction. These findings suggest that facial features are critical for evaluating competence and predicting election outcome, and that masculine features may enhance stereotypical leadership impressions.

Economic status cues from clothes affect perceived competence from faces

Article 09 December 2019

If the face fits: predicting future promotions from police cadets’ facial traits

Article 04 February 2023

A High Bar or a Double Standard? Gender, Competence, and Information in Political Campaigns

Article 04 August 2016

Introduction

At a glance, human observers perceive a vast amount of information from faces, including identity, expressions, race, gender, and trait characteristics (e.g. trustworthiness, attractiveness, competence)¹. Such information is highly functional and versatile for various social interactions. While rapid social attributions from faces are often useful in daily life (e.g. deciding whether to approach or avoid a stranger), personality traits perceived from faces may also be heavily relied on as heuristics to quickly form preferences and inferences as the basis of important decisions, including casting a vote for a political candidate in a democratic election^2,3. Although information beyond facial characteristics about political candidates is often readily available for voters, consistent evidence has shown that personality trait judgement made based on visual appearance from photographs by naïve participants who had no prior knowledge about the candidates can reliably predict actual voting outcomes in the real-world or hypothetical voting outcomes in laboratory settings^4,5,6,7.

Among various perceived personality traits, competence is the most consistently found to be predictive of election outcomes. In a typical study⁶, photographs of opposing political candidates were simultaneously shown and participants were asked to distinguish the relative difference in perceived competence^4,6,8,9. Alternatively, participants were asked to provide competence ratings to individually presented photographs of the candidates¹⁰. Candidates who were rated more competent often received more votes in the elections^4,6,10. Moreover, when asked to choose the preferred or winning candidates among pairs of candidates, participants’ selections often matched the real-world outcomes and the winners were consistently chosen. These judgments were highly consistent among participants, even when they only had a glimpse of the faces (e.g. 100 ms)^3,4,6. Longer or unlimited presentation times of the photographs were not necessary, as neither were found to significantly change the choices. Moreover, deliberation of the decisions reduced or minimized the effects⁴. Preference for the winning politicians over the runners-up based on visual cues has been replicated across elections in multiple countries^{11,12,13,14,15,16,17,18} and has been observed not only among adults but also young children¹¹.

Since competence is often considered to be one of the highly desirable leadership qualities across various cultures¹⁹, it may not be surprising that people with facial characteristics that signal high competence are more likely to be selected as leaders. It has been shown that evaluation of facial appearance may be based on stereotypes of gender roles: masculine facial characteristics likely contributed to perceptions of men’s competence and attractiveness²⁰, and perceived facial competence tends to be higher for male than female faces^10,21. Within each gender, faces with highly prototypically masculine characteristics are also rated as competent^3,21,22, although female faces with counter-stereotype traits (e.g. masculine features) might be evaluated negatively²³. Nonetheless, gendered facial appearance is found to be related to voting behavior in both past elections in the real-world and hypothetical elections in the laboratory^20,24.

But to what extent the perception of competence from visual cues is based on facial or non-facial features? If facial features are critical, would manipulations of the facial features of the winning and losing candidates increase or decrease their perceived competence and margin of predicted winning chance? Indeed, there is evidence that changes in facial features can systematically alter perceived personality traits from faces. For instance, data-driven computational models have been developed using a large set of computer-generated synthetic faces to vary perceived trait dimensions^7,25,26,27. Such models have also been validated using unfamiliar human faces with standardized external features (e.g. hairstyle or pose²¹), indicating a close relationship between visual characteristics of facial features and perceived personality traits. However, previous studies that examined the relationship between facial traits and election outcomes primarily used photographs of real-world politicians. Apart from visual differences between the facial features of the candidates, the photographs also often contain non-facial features such as hairstyle, pose, or attire. These non-facial visual features could potentially bias trait judgments or choice behaviors of participants^28,29,30. Therefore, it is important rule out the influence of non-facial features to understand whether and how intrinsic facial features may affect perception of competence and prediction of actual election winners.

To date, relatively limited research has directly addressed this issue. A previous study extracted facial features of winners and runners-up from several general or presidential elections (e.g. Bush vs. Kerry, Blair vs. Major)¹⁵. To prevent participants from recognizing the faces, the winner and runner-up faces were morphed with a composite face that was made from unfamiliar faces, though the resulting faces preserved the facial differences of the winner and runner-up faces. Although the new faces appeared unfamiliar to the participants, the hypothetical voting choices were highly correlated with the actual election outcomes. Moreover, using unfamiliar faces with facial features adjusted along the masculinity-femininity continuum, masculinized faces were more likely to be selected as the leaders in a wartime context (though feminized faces were more likely to be selected as the leaders during a peaceful time)¹⁵. However, it remains unclear how visual characteristics of the original faces of the winners and runners-up, independent of non-facial cues, might affect perception of potential leadership qualities. Moreover, given the potential differences in visual facial cues among the political candidates, it is possible that direct manipulations of masculinity and femininity of the facial features of the candidates could influence perceived competence and predicted election outcomes.

This study directly examined whether and how facial features extracted from original images alone are sufficient for evaluating competence and predicting election outcomes. We chose to use faces from politicians who took part in a range of lesser-known United States elections, including congressional, gubernatorial, and mayoral races, and recruited participants from the United Kingdom who had limited to no prior knowledge about the political candidates. Across four experiments, we examined how the facial characteristics of the candidates were evaluated. In all experiments (Experiments 1 to 4), participants completed two tasks in a counter-balanced order across participants. In the competence judgment task, participants rated the perceived competence of each individually presented face. In the election judgement task, participants selected among each pair of opposing candidates the one who would likely win the election.

As two tasks were included in the 4 Experiments, the presentation order of the tasks was counter-balanced across participants: half of the participants completed the competence judgment task first (Experiments 1A, 2A, 3A, and 4A), and the other half of the participants completed the election judgment task first (Experiments 1B, 2B, 3B, and 4B). Our main analyses between competence ratings and election outcome prediction focused on first impressions, and thus performance of the two tasks was compared across separate groups of participants as they viewed the face images for the first time (e.g. to investigate the difference in competence judgment between the original faces vs. the computerized faces, the competence ratings in Experiment 1A and Experiment 2A were compared; to examine the relationship between competence judgment and election outcome prediction for the original faces, the competence ratings in Experiments 1A vs. the election outcome prediction accuracy in Experiment 1B were used). Nonetheless, performance of the two tasks within participants (e.g. the competence ratings and election outcome accuracy within Experiment 1A) was also reported (Table 1).

Table 1 Descriptive and inferential statistics on the competence ratings for the images of the political candidates.

Full size table

In Experiment 1, the original candidate photographs were presented. Experiment 2 used face images with facial features extracted from the photographs. The face images were then modified to create masculinized and feminized versions of the candidates. Experiment 3 showed the masculinized images of the winners and the feminized images of the runners-up. Experiment 4 presented the feminized images of the winners and the masculinized images of the runners-up. Replicating previous findings using candidate photographs (e.g. Chiao et al.¹⁰; Todorov et al.⁶), we expected that Experiment 1 would reveal higher competence ratings for the winners than the runners-up. We also expected that the relative differences in competence ratings between the opposing candidates would positively correlate with the proportion of the winners being selected among the pairs. Importantly, such correlations were expected not only within the same group of participants¹⁰ but also across different groups of participants^4,6.

If these results were at least partially due to the differences in facial features among the candidates, independent of any contributions from other information available from the original images, similar results were also expected for Experiment 2. Furthermore, if competence ratings were increased by masculine facial features and were decreased by feminine facial features^10,21, the relative differences in competence ratings between the winners and the runners-up should be larger in Experiment 3 than Experiment 4. Likewise, the prediction accuracy of the winners based on the face images was also expected to be higher in Experiment 3 than in Experiment 4. Nonetheless, if both perceived competence ratings and election outcome prediction judgments were dependent on the evaluation of facial features, the correlations between the two tasks should remain high in both Experiments 3 and 4.

Experiments 1 and 2

Method

Participants

A total of 600 White participants who were United Kingdom (UK) nationals participated online via the Prolific platform (www.prolific.io). All participants reported normal or corrected-to-normal vision and were between 18 to 45 years old. All participants had a 100% approval rate on the platform. Of the total sample, 300 participants completed Experiment 1: in Experiment 1A, 150 participants first completed the competence rating task then the election judgment task; in Experiment 1B, 150 participants first completed the election judgment task then the competence rating task. There was a total of 75 female and 75 male participants recruited in each of Experiments 1A and 1B. Similarly, the other 300 participants with the same demographics completed Experiment 2 (150 participants in Experiment 2A who completed the competence judgment task first, and 150 participants in Experiment 2B who completed the election judgment task first). The study was approved by the New York University Abu Dhabi Institutional Review Board. All participants gave informed written consent prior to the experiment. All methods were performed in accordance with the relevant guidelines and regulations.

Following the pre-registered exclusion criteria (i.e. excluding data of participants that met any of the criteria below: extreme outliers with response times below 200 ms or over 5000 ms for more than 10% of the trials in either task; over 15% of consecutive trials with the response key in the competence rating task; below 0.5 standard deviation in the competence ratings), data from 489 participants were included in the analyses (Experiment 1A: N = 110; Experiment 1B: N = 125; Experiment 2A: N = 121; Experiment 2B: N = 133). In the remaining data, trials with response times below 200 ms or over 3 standard deviations from each participant’s average response times were excluded from the analyses (1.1 to 2.3% of the total trials in each experiment).

Stimuli

The competence rating and election judgment tasks used identical set of images of 48 White male politicians from the United States. All original images were freely available from the Internet and all faces were shown in frontal view. Of the 48 candidates, there were 24 pairs of finalists (winners and runners-up) from a total of 18 United States Congressional (16 Senate and 2 House of Representatives), 4 gubernatorial, and 2 mayoral elections that took place between 1996 and 2021. We used a variety of elections mainly due to the limitation that the selected portraits needed to be successfully imported into a software for transformation and not all portraits were easily suitable for the purpose. Note that in the final sample, Democratic candidates won in half of the elections and Republican candidates won in the other half of the elections.

The images used in Experiment 1 were standardized using Adobe Photoshop, with only the face and a small part of the shoulder kept for each image and any background removed. For the images used in Experiment 2, we used FaceGen Modeller Pro (Singular Inversions, Toronto, Canada) to convert the images used in Experiment 1 into 3D models with a standard face template without hair. The frontal view of the models was then saved for each face. Figure 1 illustrates the sample images of one pair of faces with the transformations used in Experiments 2–4. All images in Experiments 1 and 2 were resized to 284 pixels in height and approximately 220 pixels in width, and were converted to grayscale with luminance equated across the images using the SHINE toolbox³¹.

Procedure

Each participant completed two tasks. Apart from the differences in stimuli, the task procedures were identical between Experiments 1 and 2. In the competence rating task, participants rated the perceived competence (“How competent do you think this person is?”) of each of the 48 candidate images once, using a 7-point scale (1: not at all; 7 very much). On each trial, a candidate image was presented for 100 ms. Participants were given unlimited time to respond by a key press. The presentation order of the images was randomized for each participant.

In the election judgment task, participants selected the perceived winner (“Who would win an election?) from each of the 24 pairs of finalists once, using the two response keys corresponding to the faces presented on either the left or right side of the screen. On each trial, each face pair was presented for 1000 ms. Half of the winners were shown on the left and the other half were shown on the right. The presentation order of the face pairs was randomized for each participant.

For both tasks, participants were given practice trials which used similar faces that were not used in the main study. After the completion of both tasks, participants were asked whether they recognized any faces or whether any faces seemed familiar in a 10-choice questions (“Did any of the faces you just saw look familiar to you, or can you name any of them?”). The entire study lasted approximately 8 min.

Results

We first report the overall competence ratings and election outcome prediction accuracy in all groups (Experiments 1A, 1B, 2A, and 2B; Table 1). Then we focus on the data when participants saw the images for the first time in each experiment: to examine the role of facial features, we compared the competence ratings between Experiments 1A (original faces; competence judgment first) and 2A (computerized faces; competence judgment first), and the election outcome prediction accuracy between Experiments 1B (original faces; election judgment first) and 2B (computerized faces; election judgment first) (Fig. 2); to examine the relationship between competence evaluation and election outcome prediction, we also report the correlation between the competence ratings and election outcome prediction accuracy for each type of faces (i.e. Experiments 1A: original faces; competence judgment first vs. 1B: original faces; election judgment first; Experiments 2A: computerized faces; competence judgment first vs. 2B: computerized faces; competence judgment first; Fig. 3).

Competence ratings within each participant group

We first examined whether the faces of the winners were rated more competent than faces of the runners-up (Table 1). Intraclass correlations of the competence ratings were reported (Table 2), which showed similar results as in previous findings³². With the original photographs in Experiment 1, the overall competence ratings were significantly higher for the winners (Experiment 1A: M = 4.52, SE = 0.073; Experiment 1B: M = 4.51, SE = 0.057) than the runners-up (Experiment 1A: M = 4.32, SE = 0.072; Experiment 1B: M = 4.26, SE = 0.055), Experiment 1A: t₁₀₉ = 6.68, Cohen’s d = 0.637, p < 0.001; Experiment 1B: t₁₂₄ = 8.45, Cohen’s d = 0.756, p < 0.001.

Table 2 Intraclass correlation coefficients (ICC) of the competence ratings for the images of the political candidates in Experiments 1–4.

Full size table

Importantly, for the computerized faces in Experiment 2, the overall competence ratings were still significantly higher for the winners (Experient 2A: M = 4.04, SE = 0.061; Experiment 2B: M = 4.23, SE = 0.055) than runners-up (Experiment 2A: M = 3.93, SE = 0.064; Experiment 2B: M = 4.09, SE = 0.057): Experiment 2A: t₁₂₀ = 3.74, Cohen’s d = 0.340, p < 0.001; Experiment 2B: t₁₃₂ = 5.07, Cohen’s d = 0.440, p < 0.001.

Election outcome prediction accuracy within each participant group

We then examined whether the winners were more likely than the runners-up to be selected to win elections. Both Experiments 1 and 2 showed that the percentage of correct guesses for the winners was significantly above chance (50%): Experiment 1A: M = 56.4%, SE = 0.90%, t₁₀₉ = 7.09, Cohen’s d = 0.676, p < 0.001; Experiment 1B: M = 56.8%, SE = 0.89%, t₁₂₄ = 7.58, Cohen’s d = 0.678, p < 0.001; Experiment 2A: M = 55.0%, SE = 0.82%, Cohen’s d = 0.555, t₁₂₀ = 6.10, p < 0.001; Experiment 2B: M = 55.7%, SE = 0.75%, t₁₃₂ = 7.56, Cohen’s d = 0.656, p < 0.001. These results suggest that the winners were more likely to be selected to win elections, regardless of whether the original photographs or computerized images were shown.

Comparisons between experiments 1A and 2A on overall competence ratings across participant groups who saw the images for the first time

To examine whether the differences in competence ratings between the winners and the runners-up were larger for the original images than the computerized images, a two-way ANOVA with a within-subjects factor (Winning status: winners vs. runners-up) and a between-subjects factor (Experiment 1A: original images; competence judgment first vs. 2A: computerized images; competence judgment first) was conducted on competence ratings. As expected, the significant main effect of Winning status revealed higher competence ratings for winners than runners-up, F_1,229 = 53.67, η_p² = 0.190, p < 0.001. The main effect of Experiment was also significant, with higher competence ratings for original images in Experiment 1 than for the computerized images in Experiment 2, F_1,229 = 22.1, η_p² = 0.088, p < 0.001. Moreover, the interaction between Winning status and Experiment was significant, F_1,229 = 4.03, η_p² = 0.017, p = 0.046, revealing that although the winners were rated more competent than the runners-up in both Experiment 1A (t₂₂₉ = 6.45, p_tukey < 0.001) and Experiment 2A (t₂₂₉ = 3.85, p_tukey < 0.001), a larger difference in competence ratings between winners and runners-up was observed for the original images, compared with that for the computerized images, t₂₂₉ = 2.01, Cohen’s d = 0.265, p = 0.046, presumably due to the additional information available in the original images compared with the computerized images.

Comparisons between experiments 1B and 2B on overall election outcome prediction accuracy across participant groups who saw the images for the first time

We also examined whether there was a difference in the election outcome prediction accuracy between the original images and the computerized images. We found that there was no significant difference in election outcome prediction accuracy between the original images (Experiment 1B; election judgment first) and the computerized images (Experiment 2B; election judgment first): t₂₅₆ = 0.944, Cohen’s d = 0.118, p = 0.346.

Correlations between competence rating differences and election outcome prediction accuracy

We then examined whether the relative differences in competence ratings between the candidates would positively correlate with the proportion of the winners being selected to win among the pair. Because all participants completed both competence judgment and election outcome prediction tasks, the correlations could be analyzed both within and across participant groups who viewed the same type of faces (i.e. either original faces, or computerized faces). The correlation results within each participant group were reported in Table 1. To highlight the consistency in judgments across participants, we report below the correlations of the two measures across different groups of participants who saw the same type of faces (e.g. original faces) for the first time (Fig. 3). For the original images in Experiment 1, a significant correlation was observed between competence rating differences of the winner-loser pairs in Experiment 1A (original faces; competence judgment first) and election outcome prediction accuracy between the pairs in Experiment 1B (original faces; election judgment first), r₂₂ = 0.763, p < 0.001. For the computerized images in Experiment 2, a similar and significant correlation was also observed between competence rating differences in Experiment 2A (computerized faces; competence judgment first) and election outcome prediction accuracy in Experiment 2B (computerized faces; election judgment first), r₂₂ = 0.892, p < 0.001.

Discussion

Replicating previous findings on competence judgement from politician photographs^4,6,10,13, participants who were unfamiliar with the politicians perceived the winning candidates to be more competent than their opponents, based merely on briefly presented images. Although the participants who first completed the competence rating task were not aware of the pairing of the candidates in the actual elections and did not see the candidate pairs simultaneously (i.e. Experiments 1A and 2A), the winners were nonetheless rated to be more competent overall than the runners-up.

Perhaps unsurprisingly, participants generally provided lower competence ratings for the computerized faces (Experiment 2A), compared with the original portraits (Experiment 1A), presumably because the original portraits were more natural in appearance than the computerized faces, and the candidate were all shown in formal suits in the portraits³⁰. Nonetheless, although the computerized faces might not have completely captured all aspects of facial characteristics of the candidates, we found that for the computerized faces, the winner faces continued to be rated significantly more competent than the runner-up faces. While the overall difference in competence ratings between the winners and runners-up was larger for the original portraits with additional facial or non-facial information than the computerized face images, these results suggest that facial features are critical in influencing participants’ perceptions of competence of the candidates.

Moreover, participants also performed significantly above chance in predicting the winner from each candidate pair, regardless of whether additional information was present (Experiment 1) or only facial features were present (Experiment 2), and whether participants saw the images for the first time (Experiments 1B and 2B) or the second time (Experiments 1A and 2A). For election outcome prediction, additional information only resulted in numerically, but not statistically significantly, higher accuracy compared with only facial features were present. These results suggest that although competence ratings of individual faces were influenced by the presence of additional information in the original images compared with computerized images, the relative judgment between winners and runners-up was similar for original and computerized images. Participants remained sensitive to the relative differences among the computerized face pairs in evaluating election success between the candidates, suggesting that facial information is critical for the evaluation.

We also found positive correlations between the relative differences in perceived competence between the winners and runners-up and the election outcome prediction accuracy, with the winners who were rated more competent than the runners-up were more likely to be predicted to win the elections. These results were again observed for both the original and computerized image sets, which provided further evidence that perceived competence from facial features is utilized when evaluating likely election outcomes.

To further investigate the effect of facial features on perceived competence and election outcome prediction, we manipulated the facial features of the candidates. Since gender stereotype or typicality has shown to be closely related to perceived competence, in which increased facial masculinity is related to high perceived competence^10,21, we altered the politician faces by increasing masculinity or femininity of the facial features. In Experiment 3, the winner faces were masculinized and the runner-up faces were feminized. Conversely, the winner faces were feminized and the runner-up faces were masculinized in Experiment 4. Replicating and extending from Experiment 2, we hypothesized that in Experiment 3, the masculinized winners would be perceived as more competent and would be more likely to be predicted as the winners, compared with the femininized runners-up. However, in Experiment 4, the increased masculinity in the runners-up might overcome any initial discrepancies in perceived competence or predicted election success, compared with the winners, especially since the winners were feminized.

Similar to Experiments 1 and 2, all participants in Experiments 3 and 4 completed both the competence rating task and the election judgment task. The presentation order of the two tasks was counter-balanced across participants within each experiment: half of the participants completed the competence rating task prior to the election judgment task (Experiments 3A and 4A); the other half of the participants completed the election judgment task prior to the competence rating task (Experiments 3B and 4B). Likewise, the main analyses between competence rating and election outcome prediction focused on the comparison across participants as they viewed the faces for the first time (e.g. competence ratings in Experiment 3A vs. Experiment 4A; competence ratings in Experiment 3A vs. election outcome prediction in Experiment 3B), but the analyses were also conducted within each participant group (e.g. within Experiment 3A and within Experiment 3B).