Introduction

In recent years, the use of conjoint experiments in political science has increased significantly, with many studies examining the social characteristics of political candidates preferred by voters (Bansak et al., 2021). These experiments typically use text labels to describe the candidates' most relevant social categories with direct descriptions. While this text method has been widely used to study voter prejudice related to age, gender, ethnicity and educational level, it has recently expanded to include other social categories such as sexual orientation, disability, or HIV status (Magni & Reynolds, 2021a, b; Reher, 2021).

However, voters usually do not receive a booklet with the social categories to which the candidates belong. Instead, they tend to acquire that information through visual or textual cues. Although some recent conjoint experiments use cues such as names that sound like a woman or man or belong to an in- or out-group ethnicity (Lu et al., 2021; Ono & Zilis, 2022), which mimic low-information elections where voters learn about politicians' sociodemographics through their names on the ballot, in most elections, voters learn social characteristics of the candidates through visual elements. This trend has been stimulated by the mediatization of politics (Strömbäck & Esser, 2014). Political candidates commonly appear in televised debates, their faces are displayed on electoral street advertising, or voters receive political messages through social media. In some cases, voters may encounter the candidates' faces directly on the ballot social media (Ahler et al., 2017; Banducci et al., 2008; Moehler & Conroy-Krutz, 2016; Reynolds & Steenbergen, 2006). According to Reynolds and Steenbergen's (2006) analysis of 102 countries worldwide, more than 30 countries, mainly in Africa and Latin America but also in Cyprus or Ireland, include photographs of the candidates on the ballot. This feature provides an additional piece of visual information in both high- and low-information elections.

This paper theorizes that the acquisition of social categories through visual cues influences voters' preferences for political candidates and therefore previous studies using text cues could be showing limited evidence of voters’ prejudice. Specifically, we expect that visual cues will result in more discriminatory preferences towards social categories that are visibly apparent, such as gender binaries and ethnic in- and out-groups. Our hypothesis is based on the priming literature, which suggests that visual cues cause voters to focus on physical attributes and assign greater importance to social categories than other available information (Moehler & Conroy-Krutz, 2016). Furthermore, we argue that visual cues may unleash discriminatory preferences by inhibiting voters' motivation to control prejudice (Mendelberg, 2001). However, we do not expect the visual effect to be consistent across all social categories. Drawing on the literature on coming out, which posits that in the absence of confirmatory cues, people who belong to invisible categories such as non-binary and LGB status are assumed to be members of normative categories (i.e., gender binaries and heterosexuality) (Martin, 2009; Sandler, 2022), we hypothesize that invisible categories such as non-binary and LGB status may entail less discrimination when voters are exposed to visual cues.

To test these hypotheses, we employed a visual conjoint survey experiment with 2324 German voters, varying whether respondents received information on candidates through explicit labels or pretested AI-generated candidate pictures. The results confirm our expectations that the way in which social categories are learned affects preferences.

Specifically, we found that the preference for women over men in candidate selection is much stronger when categories are presented with text labels as opposed to visual cues. While women still receive a bonus in candidate selection, it is substantially less when images are present. However, non-binary or LGB candidates, though still less preferred than other genders or heterosexual candidates, appear to be less disadvantaged by visual cues than by text labels. Our results also show that the effect of treatment mode is complex, and that the intersection of social categories can have a significant impact on candidate selection. Moreover, our results highlight the relevance of ideology in candidate preferences. With regards to the ethnicity of candidates, while we observe no differences between visual and textual preferences when respondents are aggregated across ideologies, we find that the effect is more pronounced when we divide the respondents by ideology. This explains why right-wing respondents tend to reject non-white candidates to a greater extent when faced with visual cues, while left-wing voters seem to prefer them to a greater extent when the cues are visual. The effects of social categories on candidate selection are also moderated by ideology in the case of gender and sexual orientation.

Overall, our study sheds light on how the presentation of visible and invisible social categories affects political candidate selection and emphasizes the importance of considering the intersectionality of social categories and their relationship with ideology. Our findings contribute to the ongoing discourse on the impact of social categories in the choice of voters and may have implications for political campaigns and the study of voting behavior.

Literature Review

The Uncontrolled Effect of Visual Cues on Preferences

While previous research has shown that voters rely on social category heuristics in both visual and non-visual elections, there are various arguments indicating that visual cues prompt European voters to hold more discriminatory preferences when social categories are inferred by name or directly presented to voters. One reason for this is the priming effect that may occur when social categories are visually perceived. Priming is a phenomenon where cues have the potential to lead individuals to downplay or ignore certain criteria in favor of others.

Although priming may arise from various factors, its effects often occur unintentionally, without any particular actor's intention. Research has shown that contextual elements such as a national flag (Carter et al., 2011) or the type of building in which people vote (Berger et al., 2008) can affect electoral choices due to priming effects. While visual cues may not be able to prime individuals to consider criteria that they previously considered irrelevant, it is expected that the potential for priming identity exists when voters learn about candidates through visual elements. This would explain why politicians try to take advantage of these identity-priming effects when there are images present. The use of images that reinforce gender (Valentino et al., 2002) or ethnic identity in contexts of strong ethnic polarization (Chandra, 2007) are good examples of this phenomenon. Moehler and Conroy-Krutz (2016) showed that watching the faces of candidates in the 2011 Ugandan general elections primed voters about the ethnicity of candidates, which in turn led them to increase co-ethnic voting. In the context of Western Europe, we expect that visual priming of perceived identity will lead voters to pay greater attention to the social categories they consider relevant and, therefore, exacerbate the effect of their discriminatory preferences.

An additional factor that may lead to the intensification of discriminatory preferences through visual cues is the motivation to control prejudice. The underlying argument is that although voters might hold egalitarian views about diverse social groups, negative stereotypes, and gut feelings may still be triggered when they are exposed to implicit cues (Mendelberg, 2001). Therefore, while voters may be able to control their discriminatory preferences when social categories are presented explicitly (i.e., candidates’ text descriptions), in the presence of implicit visual cues (i.e., candidates' pictures), voters have a more intuitive response, in which they are unable to control their discriminatory preferences. Accordingly, it has been shown that appeals to race and gender are less effective than implicit appeals in evoking racial resentment, due to egalitarian social norms (Huber & Lapinski, 2006; Mendelberg, 2001; Rhodes et al., 2020). At the micro-level, differences between implicit and explicit gender attitudes of voters have also been found to be relevant to the election of women (Mo, 2015). Implicit attitudes also affect the perception of women as leaders (Eagly & Karau, 2002). The only experiment on the effect of visual cues, conducted by Abrajano et al. (2018), hypothesized that in the US voters exhibit more negative preferences towards Latino candidates when a picture is presented rather than a text description. However, their findings largely suggested the reverse. This outcome, as the authors suggest, could be because “white voters at the low end of the motivation-to-control-stereotyping scale simply respond more negatively to minority race/ethnicity the more they are encouraged to consider it” (Abrajano et al., 2018, p. 31).

In Western Europe, where conservative attitudes towards “social issues” have been declining steadily since the 80s (Caughey et al., 2019), and some suggest that egalitarian attitudes have been “nationalized” (Lægaard, 2007; Lancaster, 2022), it is expected that the motivation to control prejudice is strong. Thus, it is reasonable to anticipate the ‘uncontrolled’ effect of prejudice from visual cues in this context.

The Visibility of Social Categories

The interplay between priming and the motivation to control bias suggests that, ceteris paribus, voters should exhibit greater discriminatory preferences towards candidates' identities when exposed to visual cues as opposed to text descriptions. However, it is not expected that this effect will be of the same magnitude, or even present at all, for all social categories. While visual cues are expected to have a maximizing discriminatory effect for visible identities such as binary gender or ethnic ingroup/outgroups, this effect may be nuanced or even absent for invisible social categories such as LGBTQ+ status.

Regarding visible social categories, citizens are equipped to categorize individuals based on their physiognomy, physical appearance, names, and stereotypes. Extensive research shows that people are remarkably accurate (approaching ceiling) at deciding whether faces belong to a man or a woman, even when cues from hairstyle, makeup, and facial hair are minimised (Bruce et al., 1993, 2003). The same applies to ethnic ingroups and outgroups, with individuals tending to make ethnic divisions based on color and shape (Davidenko et al., 2008; Simpkins, 2014). It is worth noting, though, that these cues do not infallibly correspond to people's identityFootnote 1 and lead to discrimination and the perpetuation of discrimination and stereotypes.

In contrast, research on coming out highlights the existence of invisible social categories—non-normative, stigmatized social categories that need to be expressed in order to be recognized and acknowledged by the broader public (Sandler, 2022). LGBTQ+ status is a paradoxical case of an identity that needs to be disclosed to be acknowledged. Through everyday behaviors that are internalized from a young age, (cis-)heterosexuality is privileged and assumed to be the natural norm (Martin, 2009). Therefore, in the case of invisible social categories such as LGBTQ+ status, visual cues may not have a significant discriminatory effect since individuals who identify as such may not display outwardly visible markers that categorize them as non-normative. That does not mean that citizens do not form impressions of others LGBTQ+ status. Research shows that people seek to categorize individuals in terms of sexual orientation and to do so they make use of non-visual cues such as voice pitch, speech style, and behavior (e.g. Rule et al., 2008). Also face visual cues serve as a tool for sexual orientation classification although the accuracy is significantly lower than in the case of more visible identities such as binary genders and ethnic in/out-groups (Luther et al., 2021; Rule et al., 2008, 2009). In conclusion, although sexual orientation is not completely ‘invisible’, the presumption of heterosexuality means that when there are only visual cues, they are much more overlooked (and error-prone) compared to binary gender and ethnicity. Although there is not much research on non-binary genders, we expect them to be equally or more imperceptible to the eye than non-heterosexual sexual orientation. For all these reasons, we expect that the bias-liberating effect of visual cues will be much clearer for binary genders and ethnic ingroups/outgroups than for LGB and non-binary people. However, it is important to note that the lack of visibility for these social categories can also lead to systemic discrimination and marginalization, as individuals may feel forced to conceal their identity in order to avoid discrimination and harassment.

In brief, we anticipate two outcomes in this paper that:

  1. (1)

    Text cues (vis-à-vis text visual cues) will quieten social discriminatory preferences.

  2. (2)

    This effect will be only present for visible social categories such as binary gender and ethnicity. In the case of sexual and gender minorities, we expect the visual effect to be nuanced or reversed.Footnote 2

Research Design

In order to test these hypothesized effects, we followed a two-stage experimental design. As displayed in Fig. 1, we randomly assigned 2324 German voters to four conjoint experimental groups. In every experimental group, respondents had to choose the politicians they prefer. However, the information respondents received and how this information was conveyed to them varied by experimental group. This strategy allows us to estimate the causal effect of these changes on respondents’ choices.

Fig. 1
figure 1

Experimental groups

The conjoint experiments varied considerably between experimental groups. The first major difference between experimental groups was how demographic information was presented to respondents. As displayed in Fig. 1, groups 1a and group 1b received explicit labels of social group memberships, while groups 2a and 2b received the same information through pictures of the politicians. Scholars have noted that respondents might react differently depending on whether the information is conveyed through an explicit label or a picture (Abrajano et al., 2018).Footnote 3

The attributes that we either conveyed through an explicit label or implicitly through a picture are ‘Ethnicity’, ‘Gender’ and ‘Sexual Orientation’. The levels of these attributes are listed in Table 1. The levels for ‘Ethnicity’ were ‘German’ and ‘Turkish’.

Table 1 Politician attributes and levels in the conjoint experiment

About 2.8 million people of Turkish descent live in Germany, which makes Turks the largest minority in Germany. Furthermore, people of Turkish background are substantially underrepresented in Germany (Bloemraad, 2013). As ethnicity is easily conveyed through family names, we used these to convey ethnicity in groups 1a and 1b. Hence, we assigned a common German or Turkish family name to each profile. We also provided a gender-neutral first name in the form of a one single letter initial (see Appendix Table 2). The ‘Gender’ attribute features the levels ‘Male’, ‘Female’ and ‘Diverse’. We opt to use ‘Diverse’ as the label for non-binary Gender identities as Germans are allowed to designate their gender as ‘Diverse’ in their IDs since 2018. Therefore, this is the term used in most official and formal documents. Hence, respondents should be most familiar with ‘Diverse’ as a term for non-binary identities. The ‘Sexual Orientation’ variable has two levels, ‘heterosexual’ and ‘LGB’. This information was also conveyed through pictures of politicians to the implicit experimental groups 2a and 2b.

While political scientists have used pictures to convey one conjoint attribute visually (Abrajano et al., 2018), our challenge was to encode three different conjoint attributes in a picture of a politician, namely, gender, sexual orientation, and ethnicity. We achieved this goal through a multistage process. First, we selected 200 pictures of faces out of a large pool of AI-generated faces. These pictures were selected so that the depicted persons’ pose, age, and facial expression would be comparable. Using AI-generated faces allowed us to select from a large pool of comparable faces and alter specific attributes of these faces. 500 German voters evaluated these 200 faces through an online survey.Footnote 4 Based on the data generated through this pretest survey, we selected comparable faces. First, we selected pictures with comparable ratings of competence, attractiveness, and trustworthiness based on the voters’ ratings. We specifically ended up with pictures that had an average rating for these traits between 5.4 and 6.4 on a scale from 0 to 10. As we wanted to ensure that the ethnicity of the depicted politician was not ambiguous, we excluded all pictures that were not clearly identified as either ‘German’ or ‘Turkish’. For the ‘Gender’ attribute, we proceeded by grouping the pictures into either the ‘male’ or ‘female’ level if voters unambiguously rated a picture as either male or female on a 1–10 scale. If the voters were consistently ambiguous in their picture rating, we grouped it into the ‘diverse’ level of the gender conjoint attribute. We acknowledge that being perceived as someone of ambiguous gender is not equivalent to being non-binary. However, given that we want to capture the prejudice towards dissident gender identity, something that depends on the respondents’ perception, we choose this way of proceeding. For the ‘Sexual Orientation’ attribute, we grouped faces either in the ‘heterosexual’ or ‘LGB’ level using the same method as with the gender identity. As these AI-generated pictures do not have a ‘true’ sexual orientation respondents could detect, we essentially measured how certain physical traits are used as a heuristic by respondents to classify someone as heterosexual or non-heterosexual. Given the normative bias of perceiving people’s face as heterosexuals, the category LGB is based on the pictures that are less perceived as heterosexual. After this selection process, we were left with 53 similar pretested AI-generated faces in terms of their valance and attractiveness, but that varied by ethnicity, gender, and sexual orientation. Given the combination of two attributes with two levels and one attribute with three levels, there were 12 possible combinations of picture attribute levels. We, therefore, had about four pictures per attribute level combination we could randomly present to respondents. For a description of evaluations of pre-pilot respondents of each combination, see Table A2 in the Appendix.

While the experimental groups diverged in whether the ‘Ethnicity’, ‘Gender’, and ‘Sexual Orientation’ attributes were conveyed through labels or pictures, all groups received information on the politicians’ valence through labels. This has two functions. The first is to nudge respondents to think about elections and politics so that they can access the political information in their minds. The second function is expanding the focus beyond sociodemographic characteristics and, therefore, make less evident to respondents that we were interested in measuring prejudice. The valence attributes were ‘political experience’ and ‘political record’. The levels of these attributes are also displayed in Table 1. Political experience had three levels, indicating varying degrees of experience in local politics. The ‘political record’ attribute conveyed how effective the politician was in delivering campaign promises. This attribute also had three levels.

The experimental groups varied in one final aspect. Groups 1b and 2b received information on the politicians’ party membership, while groups 1a and 2a did not. This was done to see how the perception of the partisanship of a politician intersects with potential gender, ethnicity, and sexuality-based discrimination. Previous research has proven how including party labels mediates discrimination in similar conjoint analyses (López Ortega, 2023; Sen, 2017). This party attribute has four levels: Right-Wing CDU member, moderate CDU member, centrist GREENs member, and left-wing GREEN member. We choose to only include two parties instead of all German parties as, according to the polls, before we have conducted the survey, the CDU and the GREENs were the only two parties that could field electorally viable candidates in most constituencies. In order to increase the ideological variation of politicians presented to the respondents, we decided to include a level for a moderate and more extreme member of each party.

The 2324 respondents to the experiments were recruited from an online panel of German adults. Each respondent had to complete five rounds of choice tasks, which leaves us with 23,240 evaluated profiles.

Case Selection

As already mentioned, we conducted this experiment in Germany. Germany is arguably a typical case for the representation of women, immigrant-origin individuals, and LGBTQ+ people. Compared to other European countries, Germans have average attitudes towards these groups. Generally, Germans have, on average, less favorable attitudes towards individuals of these social groups than other Western or Nordic Europeans, but more favorable attitudes than Southern and Eastern Europeans (Eurobarometer, 2019). Furthermore, the survey was conducted in June 2021, 3 months before the national elections. Therefore, we expect respondents to be quite politicized, which helps the validity of our experiment. This makes Germany a compelling case to study.

Germany has a mixed-member proportional electoral system, which mixes majoritarian and PR elements. In general, fewer women are elected through the majoritarian section of the electoral system. Furthermore, immigrant candidates also do better in the PR element of the electoral system (Wüst, 2014). A more extensive discussion of the literature on the discrimination and representation of disadvantaged social groups in Germany can be found in the Appendix Sect. A.2.

Results

In recent years, different strategies have been identified for analyzing the results of conjoint experiments, the most common being the reporting of Average Marginal Component Effects (AMCE; Hainmueller et al., 2014). This comes as the result of running OLS regressions with cluster-robust standard errors, given that each respondent was confronted with five consecutive tasks. The dependent variable is the binary choice variable (candidate A or B), and the independent variables are the random attribute levels of the candidates.

However, scholars have debated whether AMCEs are an effective way to report results of a conjoint experiment (Bansak et al., 2021; Leeper et al., 2020). The fact that the AMCEs require a reference category has been identified as a possible source of problems when observing the results of the conjoint experiment by subgroups of the sample. Depending on the attribute level set as a reference, the interpretation of the results may vary. In this regard, Leeper et al. (2020) have proposed an alternative to AMCEs, namely, reporting Marginal Means (MMs). These represent the mean outcome across all appearances of a particular conjoint feature level, averaged across all other features. In the sake of transparency, we will present the results of both (1) the AMCEs of each disadvantaged trait (relative to the advantaged norm, i.e., men, White, heterosexual) and (2) the differential MMs in this section. In the Appendix Sect. A.4 we incorporate regression tables with a full description of the value quantities presented in the figures: AMCEs and the marginal difference between MMs. As we are testing multiple hypothetical results at the same time, we also report Benjamini–Hochberg correction for the benchmark model to adjust for multiple comparisons and therefore prevent false positive discovery rates (Hlatky, 2023; Liu & Shiraito, 2023). In the Appendix Sect. A.5, we also provide tests of our findings’ robustness. The plots also reports 95% confidence intervals.

Our main inquiry is whether individual preferences are dependent on voters' perceptions of candidates' identity through visual cues or text cues. The left side of Fig. 2 presents the AMCEs for our primary categories of interest: ethnicity, gender, and sexuality. The findings indicate that, in comparison to white candidates, those from ethnic minorities encounter a consistent disadvantage, irrespective of whether their identity is depicted visually or in writing. Notably, this disadvantage becomes statistically insignificant in the subset of participants who were provided with additional information about the candidates' party affiliations. In contrast, the right side of the graph illustrates no significant disparity in the marginal means between ethnic groups in either textual or visual scenarios, suggesting that discrimination against ethnic minorities persists regardless of the mode of presentation. These results defy our preliminary beliefs but are at odds with prior U.S.-based studies by Abrajano et al. (2018), which reported notable disparities when ethnicity was conveyed through visual as opposed to textual representations.

Fig. 2
figure 2

AMCE results for respondents exposed to text and visual cues, and the difference in marginal means between them. The wide thick line indicates the results for all respondents. The dark thin line indicates the results for respondents who did not receive party cues, while the pale thin line indicates the results for respondents who did receive party cues. Left panel represents the differential result of the marginal means of visual minus text stimuli. Regarding significance levels, two sets are reported: the first set, indicated by confidence levels, is as follows: ***p < 0.001, **p < 0.01, and *p < 0.05. The second set, appearing after the “/”, indicates significance levels adjusted by the Benjamini–Hochberg correction, maintaining the same confidence level symbols

In examining gender, we set the ‘man’ category as a reference and compared evaluations of women and non-binary candidates. Non-binary candidates face bias when identified via text but not through visual cues; this distinction becomes evident when analyzing the difference in marginal means, that is particularly robust among respondents not informed about party affiliations. This observation offers support to our hypothesis that non-binary candidates encounter a greater text-based bias. Women seem to be always preferred over men, although the effect fades in the case of respondents who were exposed to visual gender traits and party labels. Conversely, women are penalized when their social category is revealed through images, in accordance with our theory on the impact of visual cues. To be clear, women are still preferred over men in the presence of visual cues, but this gap is significantly lower vis-à-vis when respondents are exposed to text cues. It is important to note that, after adjusting for multiple comparisons, this effect is statistically significant only in the absence of party labels.

Regarding sexuality, LGB candidates are notably less favored compared to heterosexual candidates when identity is text-based. The results are also negative with visual cues, but then the comparison against heterosexuals is not statistically significant. The difference in marginal means between LGB and heterosexual candidates are only significant before adjusting for multiple comparisons, and in the combined model that accounts for both party and no-party labels. Overall, for “visible” social categories such as binary gender, a visual penalty is evident, but this is not the case for ethnic minority candidates. In contrast, “invisible” identities like being non-binary experience more discrimination from text than visuals, while the pattern is less discernible for LGB identities.

One might wonder why the visual effect is null for ethnicity. To explore this question in greater depth, we analyze these same results divided by ideology. As can be seen in Fig. 3, the null effect of ethnicity on the overall population is due to the interaction of ideology. The data shows that left-leaning individuals exhibit significantly less bias against non-white candidates when presented with visual cues. Conversely, this bias disappears among centrist and right-leaning voters. Moreover, the effect among left-wing respondents is only considered significant after correcting for multiple comparisons when party affiliation is not indicated. Gender results are mixed. While non-binary people seem to be less penalized by the “invisibility” of visual cues among leftist and centrist respondents and in the absence of party cues, in the case of women we see that it is center voters who significantly prefer women when cues are textual instead of visual. Finally, as far as LGB candidates are concerned, the penalizing effect of text seem to only work among center and right-wing voters, although it fades after adjusting for multiple comparisons. These results highlight how people from ethnic outgroups and women are both less chosen when exposed to visual cues under certain circumstances, and that it is not always the same ideological groups that moderate this effect. With the effect of being a candidate of a sexual or gender minority something similar happens but in reverse: the textual cues tend to benefit them, due to the invisibility of their identity, although for LGB people this effect is less clear and would be particularly manifested among center and right-wing voters. These heterogeneous results are likely due to the fact that while right-wing voters the more they are in favor of preserving the existing social order (Kreindler, 2005; Whitley, 1999), that favor candidates with traditional traits such us being white, man and heterosexual; but less inclined to control their prejudice vis-à-vis left progressive voters (Legault et al., 2007).

Fig. 3
figure 3

AMCE results for respondents exposed to text and visual cues, and the difference in marginal means between them, divided by ideological groups. The wide thick line indicates the results for all respondents. The dark thin line indicates the results for respondents who did not receive party cues, while the pale thin line indicates the results for respondents who did receive party cues. Left panel represents the differential result of the marginal means of visual minus text stimuli. Regarding significance levels, two sets are reported: the first set, indicated by confidence levels, is as follows: ***p < 0.001, **p < 0.01, and *p < 0.05. The second set, appearing after the “/”, indicates significance levels adjusted by the Benjamini–Hochberg correction, maintaining the same confidence level symbols

A pertinent question is what happens when social categories become intertwined. In Fig. 4 we answer this question. We find that the clearest discrimination is concentrated in candidates with threefold diversity learnt by text cues. In the case of discrimination against non-binary candidates, results show that discrimination is persistent when they are not only non-binary but also non-white or LGB. Specifically, the results report that when they are non-binary and LGB the gap between text and visual cues is significant. The results become especially interesting when the focus is placed on women with various intersections. In line the theory of motivation to control prejudice, we find that lesbian/bisexual women are viewed more negatively when their traits are specified with text than when they are specified with pictures. These findings suggest that the favorability of some women may not exist when the priming effect is left out of the equation.

Fig. 4
figure 4

AMCE results for respondents exposed to text cues and visual cues, as well as the difference in marginal means between both, for combinations of candidates’ social categories. The wide thick line indicates the results for all respondents. The dark thin line indicates the results for respondents who did not receive party cues, while the pale thin line indicates the results for respondents who did receive party cues. Left panel represents the differential result of the marginal means of visual minus text stimuli. Regarding significance levels, two sets are reported: the first set, indicated by confidence levels, is as follows: ***p < 0.001, **p < 0.01, and *p < 0.05. The second set, appearing after the “/”, indicates significance levels adjusted by the Benjamini–Hochberg correction, maintaining the same confidence level symbols

In sum, these results suggest that there seems to be a clear difference in the way respondents evaluate political candidates, that is dependent on the visual–non-visual modus. Evidence confirms our expectations that women, in general, are less preferred when respondents learn about them visually. This does not mean the pro-women gender gap does not exist, but it might be exaggerated by experiments that use text cues. A finding that is robust to ethnic minority and lesbian/bisexual women. Results on ethnicity are mixed, left-wing voters favoring non-white candidates in the presence of visual cues, and right-wing voters penalizing non-white candidates more when the cues are visual. This is partial evidence of our theory for visible social categories. With respect to invisible social categories such as being LGB or non-binary, we find robust evidence that visual cues penalize the electability. Finally, something that we consistently find across our analyses is that the presence of party labels reduce the differential effects of visual cues vs. text cues, suggesting that social identities carry some inferential political power that vanishes when party affiliations are unveiled to respondents (López Ortega, 2023).

Conclusions and Discussion

The identity of political candidates is not straightforward. Voters learn the social categories of political candidates based on cues of various kinds which, we argue, impact voters’ preferences. Until now, however, the vast majority of research has overlooked whether visual cues have a different effect than text cues; and whether this effect is the same for all social categories. This is a critical gap, because compared to text cues, visual cues are perceived by voters in a very different way. First, because visual cues prime voters about their identity-based affinity. Second, because visual cues do not allow voters to control their prejudice against candidates whose social categories are not traditionally associated with being a politician. Third, because not all social categories are “read” in the same way. There are categories that, although not unequivocal, are visible, such as gender binary and ethnic ingroup or outgroup membership. People easily place politicians on one side or the other when they see them. Other social categories such as being LGB or being non-binary are invisible, i.e., their non-membership is assumed and therefore mere visual cues do not induce the discrimination to which they would otherwise be subjected if their identity were specified with text labels.

We sought to shed light into this by employing a first-of-its-kind visual conjoint experiment and compare it to a traditional text conjoint experiment. We divided respondents so that one group received the social categories of gender, ethnicity and sexual orientation described with text, as most conventional conjoints do, and another with the same categories but displayed with AI-generated images together with other political variables. The results confirm our expectations that the way in which social categories are learned affects preferences. The fact that women are preferred over men to a much greater extent when there are text labels is a good illustration of this. Women still receive a bonus, as previous research has shown in Western countries, but it is substantially less when images are present. Non-binary or LGB candidates, although the latter in much less conclusive manner, appear to be less disadvantaged by visual cues than by text labels. They are still less preferred compared to other genders or heterosexuals respectively, but, consistent with our theory, the discriminatory effect is softened.

The results on candidates' ethnic membership hint at the relevance of ideology. While, when we aggregate respondents across ideologies, we observe no differences between visual and textual preferences, when we divide the respondents by ideology, we find that the effect is more pronounced. While right-wing respondents reject non-white candidates to a greater extent when faced with visual cues, left-wing voters seem to prefer them to a greater extent when the cues are visual. The gender and sexual orientation results are also moderated by ideology. This highlights how the effect of treatment mode is complex and that, while effects may cancel out at first glance, heterogeneity may be revealed by ideological groups, most likely because of their differences in predispositions to prejudice (Legault et al., 2007); as previous research has also shown in the case of LGBT+ (Magni & Reynolds, 2021a, b). Our results also highlight the effects of social categories when they intersect. An example of this is how the bonus that women receive disappears when cues are visual and women are lesbian or ethnically outgroup, or how the bonus is reversed when women are lesbian and non-white, regardless of whether the cues are text and visual. Additionally, the introduction of party labels diminishes the differential impact of visual versus textual cues, suggesting that social identities exert a certain inferential political influence, which diminishes once party affiliations are disclosed to respondents, congruent with what is found by López Ortega (2023).

This study has some limitations. In concrete we identify two that we only addressed in part in our paper. The first has to do with the explicitness of signals. The study employs an implicit signal of ethnicity in both treatment conditions (using names vs. pictures). In contrast, the textual signals for sexual orientation and gender are explicit, as we clearly identify candidates as LGB, women, or non-binary. It is anticipated that with cues that evoke stereotypes associated with LGBTQ+ individuals—such as clothing, body language, voice tone, or a combination of these—the outcomes might differ. Exploring these variations in future research could provide valuable insights. The second limitation has to do with the strength of visual signals. In the paper we took measures to control for the comparability of measures by pretesting and selecting AI-generated pictures that would correspond text labels. However, due to the difference in the nature of visual and text stimuli it is impossible to discard that the difference signal strength affects the results. Particularly for LGB and non-binary categories, we depend on subtle, pre-tested facial cues. These cues often carry assumptions of normativity that are especially pronounced regarding sexual and gender identities. Even when respondents may perceive someone as LGBTQ+ based from their face, this cue is arguably subtle and inconclusive. Furthermore, visual cues for binary gender and ethnicity may also differ in impact compared to using names that suggest ethnicity and gender, as utilized in this experiment. One way to address this limitation could be asking survey respondents who are exposed to visual cues to infer candidates’ identities, however this would make them reinforce the signal vis-à-vis respondents subject to the text treatment, and therefore limitate the comparability. We encourage future researchers to address the issue of signal strength in more detail not only in experiments that compare visual vs. text stimuli, but in any experiment. In non-experimental scenarios, citizens receive cues of candidates’ identity in multiple ways depending on the electoral context. The strength of the signal, and its specific content, may vary substantially depending on the cues that respondents are receiving, and also the extent to which both prejudice and the motivation to control it get activated. While we have endeavored to select visual stimuli that minimize signal variability and potential confounding factors, we believe the field's understanding of the impact of visual cues could be further enhanced by developing more robust methods and detailed analyses to measure and interpret richer and more complex identity signals.

Our main implication is that treatment mode is thus not interchangeable, and researchers should choose between them carefully. While we do encourage the use of AI-generated pictures generally, as we believe that this approach holds great promise for political science applications, we do not recommend replacing text-based conjoint experiments with picture-based conjoint experiments indiscriminately. Generally, we advise conveying information about a politician or any evaluated object in the way the respondents would usually encounter it. This way researchers might be able to increase the external validity of experimental research.

Additionally, to the best of our knowledge, this is one of the first experimental study on voters’ evaluation of non-binary candidates. We can show that these non-binary candidates face considerable obstacles in politics. We find, in line with findings on transgender politicians (Loepp & Redman, 2020; Magni & Reynolds, 2021a, b), that non-binary candidates might face more discrimination than LGB candidates: contrary to LGB candidates, non-binary candidates even face discrimination by voters of the left.